PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Stat Med. Author manuscript; available in PMC 2010 September 30.
Published in final edited form as:
Stat Med. 2009 September 30; 28(22): 2748–2768.
doi:  10.1002/sim.3640
PMCID: PMC2773213
NIHMSID: NIHMS137898

Assessing Cumulative Incidence Functions under the Semiparametric Additive Risk Model

SUMMARY

In analyzing competing risks data, a quantity of considerable interest is the cumulative incidence function. Often the effect of covariates on the cumulative incidence function is modeled via the proportional hazards model for the cause-specific hazard function. As the proportionality assumption may be too restrictive in practice, we consider an alternative more flexible semiparametric additive hazards model of [1] for the cause-specific hazard. This model specifies the effect of covariates on the cause-specific hazard to be additive as well as allows the effect of some covariates to be fixed and that of others to be time-varying. We present an approach for constructing confidence intervals as well as confidence bands for the cause-specific cumulative incidence function of subjects with given values of the covariates. Furthermore, we also present an approach for constructing confidence intervals and confidence bands for comparing two cumulative incidence functions given values of the covariates. The finite sample property of the proposed estimators is investigated through simulations. We conclude our paper with an analysis of the well-known malignant melanoma data using our method.

1. INTRODUCTION

Competing risks data are typically encountered in medical studies, for example, in studies dealing with time to progression of spontaneous labor, where labor due to medical intervention (example: delivery by cesarean) or membrane rupture leading to labor are treated as other causes. Another example can be found in the well-known malignant melanoma data [2], where patients with malignant melanoma were either at risk to die from malignant melanoma or from other causes. Typically response to a treatment can be classified in terms of failure from disease of interest and/or non-disease related causes. So in competing risks framework, each individual is exposed to K distinct types of risks and the eventual failure can be attributed to precisely one of the risks. Suppose each subject has an underlying continuous failure time T that may be subject to censoring. The cause of failure J [set membership] {1, … , K} is observed along with covariate information. The cause-specific hazard function for a subject with a covariate vector z is defined by

λk(tz)=limΔt01ΔtP(tT~<t+Δt,J=kT~t,z)

for k = 1, … , K. The cause-specific hazard function λk(t|z) is the instantaneous rate of occurrence of the k-th failure cause at time t in the presence of all causes of failure given z. To predict the survival probability of the k-th type of failure time with a particular set of covariates z, Sk(tz)=exp(0tλk(uz)du) is not meaningful as it has no simple probability interpretation when the competing risks are dependent [3]. The more appropriate function is the cumulative incidence function, defined by

Fk(tz)=P(T~t,J=kz).

The cumulative incidence function Fk(t|z) is the probability of a subject failing from cause k at time t in the presence of all the competing risks given z.

Note that Fk(t|z) can be expressed in terms of λl(t|z), where l = 1, … , K. It is easy to show that

Fk(tz)=0tS(uz)λk(uz)du

with the overall survival function S(tz)=exp(0tΣl=1Kλl(uz)du) Thus, it is natural to estimate the cumulative incidence function through the cause-specific hazard function. Typically, one models the effect of covariates using the proportional hazards model ([4], [5]). Cheng et al. have studied the estimation of the cumulative incidence function based on Cox's regression model in a competing risks model [6]. However, the proportionality assumption under the proportional hazards model may be too restrictive in practice. For instance, the proportionality assumption for the malignant melanoma data fails based on the score test provided by [7] (see the numerical section for further details). Thus, it is of interest to investigate alternative approach and an important alternative is the additive risk model.

Shen and Cheng [8] presented an approach to estimating the cumulative incidence function under the additive risk model first proposed by [9]

λk(tz)=λ0k(t)+zTβk,
(1)

where λ0k(·) is an unspecified baseline hazard function for the k-th failure type and βk an unknown parameter vector. Often, this model only provides a rough summary of the effect of covariates, because in their model the influence of all the covariates was restricted to be constant, i.e., time-varying effects of covariates cannot be captured by this model. Aalen et al. [10] have considered a fully nonparametric additive model for the cause-specific transition hazards, where the effects of all the covariates are assumed to be time-varying in dealing with Markov chain models.

In many instances in practice, we may know that some of the covariates will have time-constant effects (for example, demographic characteristics) and some may be suspected to have time-varying effect (for example, treatment). In particular, using the test provided by [11] for checking the time-varying effect or constant effect of a covariate based on the additive model , we found that age and sex have time constant effect, and thickness of the tumor has time-varying effect for the malignant melanoma data. Motivated by this, we study a flexible additive risk model which allows for some covariates to be modeled parametrically and others to be modeled nonparametrically. The suggested approach is to study the semiparametric additive model based on [1] in the competing risks setup which is more appropriate in some applications.

To be specific, under the semiparametric additive risk model the cause-specific hazard function for cause k, given covariates x and z takes the form

λk(tx,z)=xTαk(t)+zTβk,

where the covariates are partitioned into x, a p-dimensional covariate vector with time-varying effects and z, a q-dimensional covariate vector with time constant effects, αk(t) is a p-dimensional locally integrable function, and βk is a q-dimensional regression vector. The first component of x may be set to 1 to allow for a general baseline hazard. When some covariates are known or anticipated to yield time-varying effects, they will be investigated nonparametrically. In this paper, we present a way to construct confidence interval for the cumulative incidence function for a specific cause given specific value of covariates. We also present an approach to constructing confidence intervals and bands for cumulative incidence functions under semi-parametric additive model. Furthermore, we also propose methods to compare two cumulative incidence functions by constructing confidence interval and band of their ratio or difference.

The paper is structured as follows. In Section 2 we present our methods for constructing confidence intervals and bands for the cumulative incidence function and for the difference and the ratio of two cumulative incidence functions. The finite sample property of the proposed estimators is investigated extensively through simulations in Section 3. In Section 4 we illustrate the proposed method with data from malignant melanoma study. The details of our asymptotic results are made explicit in the Appendix.

2. Estimation of the Cumulative Incidence Functions

Suppose that there are K distinct failure types. Let Tki be the k-th latent failure time for individual i, where k = 1, … , K and i = 1, … , n. τ is the end of follow-up. Then, one can only observe (Ti, Ji, δi, xi, zi), where Ti = min(Ti, Ci), Ti = mink{Tki, k = 1, … , K} and (xi, zi) are covariates for the i-th subject. Here, Ji is the failure type indicator (Ji = k if Ti = Tki), δi is the censoring indicator (δi = 1 if Ti is observed and 0 otherwise), and Ci is the independent censoring variable. We assume that Tki does not equal Tli for kl, k, l = 1, … , K. In this paper, we assume that the cause-specific hazard function for Tki, given covariates xi and zi is modeled by

λki(txi,zi)=xiTαk(t)+ziTβk,
(2)

for k [set membership] {1, 2, … ,K}. Let Yi(t) = I(Tit) indicate the at-risk process of the i-th individual, where I(·) is the indicator function. We organize the covariates into design matrices

X(t)=(Y1(t)x1,,Yn(t)xn)TandZ(t)=(Y1(t)z1,,Yn(t)zn)T.

Let δki = δiI(Ji = k). Then Nki(t) = δkiI(Tit) indicates whether an event of type k has been observed by time t for individual i. Let

Nk(t)=(Nk1(t),,Nkn(t))Tandλk(t)=(λk1(t),,λkn(t))T

be the n-dimensional counting process and its intensity. Denote by ωk(t), an n × n matrix given by diag {1/λk1, … , 1/λkn} for k [set membership] {1, … , K}. Estimation of (2) for the cause-specific function can be carried out by using the estimation procedures proposed in [1]. Each βk and the cumulative time-varying effect Ak(t)=0tαk(u)du, k = 1, … , K can be consistently estimated by [beta]k and Âk(t), while treating all the failure time Ti with Jik as censored observation. To be specific,

β^k=[0τZT(t)Hk(t)Z(t)dt]10τZT(t)Hk(t)dNk(t)

and

A^k(t)=0tXk(u)(dNk(u)Z(u)β^kdu),

where Hk(t) = ωk(t) − ωk(t)X(t)Wk(t)−1XT (t)ωk(t), Xk(t)=Wk(t)1XT(t)ωk(t), and Wk(t) = XT (t)ωk(t)X(t).

However, the above estimates involve the weight function ωk(t). As suggested in [1], one can replace ωk(t) by an identity matrix and calculate the estimates for βk and Ak(·) and use them to estimate the weight matrix ωk(·) and calculate the consistent estimates [beta]k and Âk (·).

By combining these we can consistently estimate the cumulative incidence function for the cause 1, for example, with a particular set of covariates x0 and z0 by

F^1(tx0,z0)=0tS^(ux0,z0)dΛ^1(ux0,z0)

where S^(tx0,z0)=exp(Σk=1KΛ^k(tx0,z0)) is an estimator of the overall survival probability S(t|x0, z0) and [Lambda with circumflex]k is the estimator of the cumulative cause-specific hazard function Λ^k(tx0,z0)=0tx0TdA^k(u)+tz0Tβ^k. We next present our approach for constructing confidence interval and confidence band for a specific cause-k cumulative incidence function conditional on specific values of the covariate.

2.1. Confidence interval (band) for a specific cumulative incidence function

In the Appendix, we show that n(F^1(tx0,z0)F1(tx0,z0)) is asymptotically equivalent to a sum of square integrable martingales U1(t|x0, z0) given by

U1(tx0,z0)=1nΣi=1n1i(tx0,z0)=1nΣi=1n{0tS(ux0,z0)x0T(n1W1(u))1ω1i(u)xidM1i(u)}0tS(ux0,z0)x0TX1(u)Z(u)duC11D1i0tS(ux0,z0)duz0TC11D1iΣk=1K(0tF1C(t,u)x0T(n1Wk(u))1ωki(u)xidMki(u))0tF1C(t,u)x0Tωk(u)X(u)Z(u)duCk1Dki{(0tF1C(t,u)duz0TCk1Dki)},
(3)

where

FkC(t,u)=Fk(tx0,z0)Fk(ux0,z0)Ck=1n0τZT(t)Hk(t)Z(t)dtDki=0τ{ziωki(t)ZT(t)ωk(t)X(t)Wk(t)1ωki(t)xi}dMki(t)Mki(t)=Nki(t)0tYi(u){xiTαk(u)+ziTβk}du

for k [set membership] {1, 2, … , K} and i = 1, 2, … , n. Furthermore using the martingale central limit theorem, it converges in distribution to a Gaussian process. The martingale representation of U1(t|x0, z0) can also be used to construct a consistent estimator of the asymptotic variance function. The variance function at time t can be consistently estimated by

σ^12(tx0,z0)=1nΣi=1n(^1i(tx0,z0))2

obtained by replacing all the terms with their empirical version and the parameters βk and Ak by their consistent estimates [beta]k and Âk.

Next, combining the asymptotic normality of F1(t|x0, z0) and the consistent estimator σ^12(tx0,z0) for the asymptotic variance, we can construct an (1−α)×100% confidence interval for F1(t; x0, z0) as follows:

g1(g(F^1(tx0,z0))±n12g(F^1(tx0,z0))σ^1(tx0,z0)zα2),

where g is a smooth function chosen such that it retains the range of the distribution F1, g′ its continuous derivative and g−1 denotes the inverse function of g. Typically, the transformation g is chosen for constructing confidence interval for distributions so as to retain their range as well as to improve the coverage probability. Observe that using the functional delta method, the asymptotic normality of g(F1(t|x0, z0)) follows from that of F1(t|x0, z0), but with asymptotic variance given by (g(F1(tx0,z0)))2σ12(tx0,z0).

Another quantity of interest is the confidence band for the cumulative incidence function. However, in order to construct an (1−α)×100% confidence band for F1(t|x0, z0) simultaneously for t [set membership] [0, τ], we need to investigate the distribution of the supremum of the process U1(t|x0, z0), 0 ≤ tτ. This is analytically challenging as U1 does not have an independent increment structure. Alternatively, we adapt the general procedure suggested by [7] to get an approximation of the distribution of the process U1. We approximate the distribution of U1(t|x0, z0) by a zero-mean Gaussian process, denoted by Û1(t|x0, z0), whose distribution can be easily generated through simulations. We replace {Mki(t)} in (3) with {Nki(t)Gi}, and the other unknown quantities in (3) with their respective sample estimators to yield Û(t|x0, z0), where {Gi : i = 1, … , n} are independent standard normal variables. The asymptotic equivalence of U1(t|x0, z0) and Û1(t|x0, z0) is established by the fact that

E{Mki(t)}=0,Var{Mki(t)}=E{Nki(t)}.

To approximate the distribution of U1(t|x0, z0), we simply obtain a large number, say M, of realizations from Û1(t|x0, z0) by repeatedly generating random samples {Gi}, while fixing the data {(Ti, δki, xi, zi) : k = 1, … , K, i = 1, … , n} at their observed values.

An (1 − α) × 100% confidence band for F1(t|x0, z0) on [t1, t2] is given by

F^1(tx0,z0)±n12cασ^1(tx0,z0),

where the cutoff value cα is given by

P[supt1tt2U^1(tx0,z0)σ^1(tx0,z0)>cα]=α

which can be estimated through replicates of Û 1(t|x0, z0) given the observed data. Here t1 and t2 (t1 < t2) can be any time points between (0, t0), where t0 = inf{t : EY1(t) = 0}.

2.2. Comparison of two cumulative incidence functions

In many practical situations, one is interested in comparing cumulative incidence functions for a specific cause given two different values of covariates or comparing cumulative incidence functions for two different causes given a specific value of the covariate. For instance in malignant melanoma data, the difference in cumulative incidence functions for malignant melanoma between females and males measures the difference in the probability of dying from malignant melanoma between females and males. Similarly, the difference between cumulative incidence functions at different values of tumor thickness shows how the probability of dying from malignant melanoma is changed by tumor thickness. While the difference of two cumulative incidence functions measures additional probability of survival/dying from a cause, the ratio shows a relative rate of survival/dying from a cause at two different covariate values.

We begin by presenting a way to compare cumulative incidence functions for k-th cause, say, the first cause given two different values for covariates (x, z) = (x1, z1) and (x, z) = (x2, z2). We will first present the method for calculating the confidence interval and confidence band for their ratio. Using the consistency of Fk and boundedness in probability of n(Fk(tx,z)Fk(tx,z)) for t [set membership] [0, τ] established in previous section, it follows that

n(F^1(tx1,z1)F^1(tx2,z2)F1(tx1,z1)F1(tx2,z2))1F1(tx2,z2)n(F^1(tx1,z1)F1(tx1,z1))F1(tx1,z1)F12(tx2,z2)n(F^1(tx2,z2)F1(tx2,z2)),

where ≈ denotes asymptotic equivalence. For notational simplicity, we denote by a(t) = 1/F1(t|x2, z2) and by b(t)=F1(tx1,z1)F12(tx2,z2). Using similar arguments as in (3), it follows that n(F^1(tx1,z1)F^1(tx2,z2)F1(tx1,z1)F1(tx2,z2)) is asymptotically equivalent to a sum of square integrable martingales given by

U~1(tx1,z1,x2,z2)=1nΣi=1n~i(tx1,z1,x2,z2)=1nΣi=1n{0t(a(u)S(ux1,z1)x1b(u)S(ux2,z2)x2)T(n1W1(u))1ω1i(u)xidM1i(u)}0t(a(u)S(ux1,z1)x1b(u)S(ux2,z2)x2)TX1(u)Z(u)duC11D1i0t(a(u)S(ux1,z1)z1b(u)S(ux2,z2)z2)TduC11D1iΣk=1K(0t(a(u)F1C(t,ux1,z1)x1b(u)F1C(t,ux2,z2)x2)T(n1Wk(u))1ωki(u)xidMki(u))0t(a(u)F1C(t,ux1,z1)x1b(u)F1C(t,ux2,z2)x2)TXk(u)Z(u)duCk1Dki{(0t(a(u)F1C(t,ux1,z1)z1b(u)F1C(t,ux2,z2)z2)TduCk1Dki)},
(4)

where F1C(t,ux1,z1)=F1(tx1,z1)F1(ux1,z1),F1C(t,ux2,z2)=F1(tx2,z2)F1(ux2,z2), Ck, ωk and Dki are defined in Section 2.1. Using the martingale central limit theorem, the above process converges in distribution to a Gaussian process. Furthermore, using the i.i.d. representation of Ũ(t|x1, z1, x2, z2) we can construct a consistent estimator of the asymptotic variance function. The variance function at time t can be consistently estimated by

σ~^2(tx1,z1,x2,z2)=1nΣi=1n(~^i(tx1,z1,x2,z2))2
(5)

obtained by replacing all the terms in (4) with their empirical version and Fk(t, xi, zi) by their consistent estimator Fk(t, xi, zi) and the parameters βk and Ak by their consistent estimates [beta]k and Âk.

Next, to construct an (1 − α) × 100% confidence band for F1(t|x1, z1)/F1(t|x2, z2) for t [set membership] [t1, t2], we propose a Gaussian multiplier approach. Let

U~^(tx1,z1,x2,z2)=1nΣi=1ni~^(tx1,z1,x2,z2)Gi,

where {Gi : i = 1, …, n} are independent standard normal random variable. Observe that U~^(tx1,z1,x2,z2) is asymptotically equivalent to Ũ(t|x1, z1, x2, z2). We can construct an (1 − α) × 100% confidence band for the ratio F1(t|x1, z1)/F1(t|x2, z2) on [t1, t2] as follows:

F^1(tx1,z1)F^1(tx2,z2)±n12cασ~^(tx1,z1,x2,z2),

where the cut-off value cα given by

P[supt1tt2U~^(tx1,z1,x2,z2)σ~^(tx1,z1,x2,z2)>cα]=α.

The cut-off value can be estimated by approximating the distribution of Ũ(t|x1, z1, x2, z2) by obtaining a large number of realizations of U~^(tx1,z1,x2,z2) given the observed data by repeatedly generating random samples of {Gi : i = 1, …, n}.

Similarly, we can construct confidence interval and confidence band for the difference between cumulative incidence functions for a specific cause-k given different values of covariates, i.e., for Fk(t|x1, z1) − Fk(t|x2, z2). Notice that

n(F^1(tx1,z1)F1(tx1,z1))n(F^1(tx2,z2)F1(tx2,z2))

is asymptotically equivalent to 1nΣi=1n~i(tx1,z1,x2,z2) with a(t) = b(t) = 1 for all t [set membership] [0, t0]. Hence, an (1 − α) × 100% confidence interval and confidence band in t [set membership] [t1, t2] for the difference can be constructed as

(F^1(tx1,z1)F^1(tx2,z2))±n12cασ~^(tx1,z1,x2,z2),

where cα and σ~^() can be calculated as mentioned above with a(t) = 1 and b(t) = 1 for t [set membership] [0, t0] with t0 = inf{t : EY1(t) = 0}.

Next, we discuss constructing confidence interval and band for comparing cumulative incidence functions for two different causes given a specific value for the covariate. We begin by considering their ratio i.e., F1(t|x0, z0)/F2(t|x0, z0). Using the consistency of Fk and boundedness in probability of n(Fk(tx,z)Fk(tx,z)) for t [set membership] [0, t0] established in previous section, it follows that

n(F^1(tx0,z0)F^2(tx0,z0)F1(tx0,z0)F2(tx0,z0))1F2(tx0,z0)n(F^1(tx0,z0)F1(tx0,z0))F1(tx0,z0)F22(tx0,z0)n(F^2(tx0,z0)F2(tx0,z0)).

For notational simplicity, we denote by a(t) = 1/F1(t|x0, z0) and by b(t)=F1(tx0,z0)F22(tx0,z0). Using similar arguments as in (4), we can show that n(F^1(tx0,z0)F^2(tx0,z0)F1(tx0,z0)F2(tx0,z0)) is asymptotically equivalent to a sum of square integrable martingales given by

U~(tx0,z0)=1nΣi=1n~i(tx0,z0)=1nΣi=1n{0tS(ux0,z0)x0T(a(u)(n1W1(u))1ω1i(u)xidM1i(u)}b(u)(n1W2(u))1ω2i(u)xidM2i(u))0tS(ux0,z0)x0T(a(u)X1(u)Z(u)C11D1ib(u)X2(u)Z(u)C21D2i)du0tS(ux0,z0)z0T(a(u)C11D1ib(u)C21D2i)duΣk=1K(0t(a(u)F1C(t,u)b(u)F2C(t,u))x0T(n1Wk(u))1ωki(u)xidMki(u))0t(a(u)F1C(t,u)b(u)F2C(t,u))x0TXk(u)Z(u)duCk1Dki{(0t(a(u)F1C(t,u)b(u)F2C(t,u))duz0TCk1Dki)}.
(6)

Using the martingale central limit theorem, the above process converges in distribution to a Gaussian process. Furthermore, using the martingale representation of Ũ(t|x0, z0) we can construct a consistent estimator of the asymptotic variance function. The variance function at time t can be consistently estimated by

σ~^2(tx0,z0)=1nΣi=1n(~^i(tx0,z0))2

obtained by replacing all the terms with their empirical version and Fk(t|x0, z0) by their consistent estimator Fk(t|x0, z0) and the parameters βk and Ak by their consistent estimates [beta]k and Âk. The rest is similar to constructing an (1 − α) × 100% confidence interval and band for comparing cumulative incidence functions for a cause-k given two different values of covariates.

3. Simulation Study

We investigate the finite sample properties of the various proposed confidence intervals and confidence bands for the cumulative incidence function(s) and apply the methods in analyzing a real data. We first begin with simulation studies conducted to evaluate the performance of our proposed method. We generated a semiparametric additive risks model (2) with two causes. The covariate x associated with the time-varying parameter was chosen to be a 0/1 variable each with probability .5. We also included a covariate z, associated with the time constant effect generated from uniform distribution on [0, 1]. The K = 2 latent failure times were taken to be conditionally independent with cause-specific hazard functions given by

λk(tx,z)=xαk(t)+zβk

where for k = 1 (cause 1), α1(t) = 2.0 for 0 ≤ t ≤ 0.5 and 1.0 for t > 0.5 and β1 = 1.0; for k = 2 (cause 2), α2(t) = 1.0 for t > 0 and β2 = 1.5. The censoring C was taken to be exponentially distributed with the rate parameter adjusted, so that 30% and 50% of the observations were censored. We simulated data of sample sizes n = 100 and n = 250, each replicated 1000 times. In order to illustrate the performance of the proposed estimator of the cumulative incidence functions, in Figure 3 we present a plot of the estimates of cumulative incidence function for cause 1 for covariate value x = 1 and z = .5 under the above model based on 50 replicates of the sample. The estimates are based on the sample sizes 100 and 250 with 30% censoring, respectively. Figure 3 shows that the estimates tend to get closer to the true cumulative incidence function as the sample size increases.

Figure 3
Predicted cumulative incidence function of malignant melanoma death for a 52-year-old female with ulceration and tumor thickness of 6.76 mm with 95% confidence intervals and bands.

Next, in order to examine the performance of the estimates of cumulative incidence functions and their confidence intervals at time points t for specific value of covariate, we examined their performance for t in .1, .2, .3, .4, .5 and .6, where .60 is close to the 70-th percentile of the underlying distribution for the time to first cause, i.e., k = 1. We also investigated the performance of confidence band for t [set membership] [.1, .6]. The numbers reported in Table I are the bias, sampling standard deviation, estimated asymptotic standard deviation of the predicted cumulative incidence function for the cause 1 for covariates (x, z) = (1, .5) at the time points mentioned above. Table I also presents the coverage probability of the 95% confidence interval for the cumulative incidence function for cause-1 at the above mentioned time points as well as the coverage probability for its 95% confidence band in the last column. These results are based on 1000 replications for each of the sample sizes (n = 100, 250) and censoring percentages (30% and 50%).

Table I
Estimation of cumulative incidence function for cause 1 at (x, z) = (1, .5).

The numbers indicate overall small biases of the proposed cumulative incidence functions and that their sampling standard deviations decrease as the sample sizes increase for each time point and each censoring level. The performance remains same when the censoring percentage increases from 30% to 50%. The estimated asymptotic standard deviations are very close to the sampling standard deviations indicating the appropriateness of the finite sample approximations to calculate the estimate for the asymptotic standard deviation. Furthermore, the coverage probabilities for 95% confidence intervals for the cumulative incidence function is good for n = 100 and gets very close to the nominal level of 95% as the sample size increases to n = 250 for both censoring percentages 30% and 50%, respectively. We next studied the performance of proposed estimators for comparing cumulative incidence functions given different values of the covariates. In Table II, we present the bias, sampling standard deviation, estimated asymptotic standard deviation and the coverage probability of the 95% confidence interval for the difference between cumulative incidence functions for cause-1 at (x1, z1) = (1, .5) with that at (x2, z2) = (0, .5), as well as its confidence band for t [set membership] [.1, .6]. Overall, the performances are similar to that found in Table I indicating that the proposed estimation procedure for constructing confidence intervals and bands for comparing cumulative incidence functions seems to perform very well even with the weight function being chosen to be identity. Our experience with choice of weight function based on λ indicates that the performance improves from the point of view of efficiency and consequently the coverage probability.

Table II
Estimation of the difference between cumulative incidence functions for cause 1 at (x1, z1) = (1, .5) and (x2, z2) = (0, .5).

4. A real data application: Malignant melanoma study

We now illustrate our methodology with an analysis of data on 205 malignant melanoma patients observed during the years 1962 to 1977 [2]. Among these 205 patients, 57 patients died from malignant melanoma, 14 patients died from causes other than malignant melanoma, and the remaining 134 patients were alive at the end of follow-up. The covariates considered here were tumor thickness (mean: 2.92 mm; standard deviation: 2.96 mm), ulceration status (90 present and 115 not present), age (mean: 52 years; standard deviation: 17 years) and gender (79 male and 126 female). In our analysis, the covariates of tumor thickness and age were standardized. The main interest in this melanoma study is to predict the patient-specific cumulative incidence probability for the melanoma death.

We first began with fitting the proportional hazards model to the data and used the score process test proposed by [7] to test for proportionality of the underlying covariates of interest. We found that the proportionality assumption for age and sex were reasonable with p-values of 0.554 and 0.380, respectively. However, the p-values for the proportionality test for tumor thickness and ulceration status resulted in p-values of 0.032 and 0.008, respectively, indicating lack of fit. Thus this indicates that the Cox's model may be inappropriate here.

We next fitted the additive model (1) where all covariates of the model had nonparametric time-varying effects. Based on the results not presented here by applying the testing procedures of [11], the estimates indicated that gender and age did not have significant time-varying effects. So, based on this preliminary analysis, our final model considers the effect of gender and age as time constant effects, and tumor thickness and ulceration status as time-varying nonparametric effects. We used the weight function introduced in Section 2 throughout this analysis as we found an improvement in efficiency with the usage of the weights in comparison to the estimates based on weight matrix being identity. Based on this final model, the cumulative effects of the baseline estimate, tumor thickness, and ulceration, with 95% pointwise confidence intervals and bands, are given in Figure 4. The cumulative regression function for tumor thickness shows a positive effect in the first 4 years, which then seems to flatten out. Ulceration shows a positive effect within the first 4 years and a slower effect after that. The 95% confidence interval of the constant coefficients for gender and age are (−.1021, .1450) and (−.0535, .0691), respectively.

Figure 4
Predicted difference between two cumulative incidence functions of malignant melanoma death at three sets of covariate values specified in the last paragraph of Section 4.

Next, we estimated the cumulative incidence function of malignant melanoma for a 52-year-old female patient with ulceration and tumor thickness of 6.76 mm (90-th percentile of tumor thickness), and display its 95% confidence intervals and bands under the semiparametric additive model in Figure 4. We used a transformation of g(·) = log(−log(·)).

Finally, in Figure 4 we estimate the difference between the cumulative incidence functions for malignant melanoma for different values of covariates. In Figure 4(a), in order to ascertain the effect of tumor thickness on the cumulative incidence function, we estimated the difference between cumulative incidence functions for a female of 52 years of age with ulceration and tumor thickness of one standard deviation above the observed mean tumor thickness with that for a female of same age with ulceration and tumor thickness at the mean value. In Figure 4(b), in order to ascertain the effect of ulceration on cumulative incidence function, we estimate the difference between cumulative incidence functions for a 52 year old female with mean value of tumor thickness and with ulceration with that for a 52 year old female with mean value of tumor thickness but no ulceration. In Figure 4(c), in order to ascertain the gender effect on cumulative incidence function, we estimated the difference between a 52 year old male with ulceration and mean tumor thickness with that of a 52 year old female with ulceration and mean tumor thickness. Observe that for a 52 years old female with ulceration, Figure 4(a) estimates the additional probability of dying from malignant melanoma over time for those with tumor thickness of 5.88 mm (one standard deviation above the mean) compared to those with tumor thickness at 2.92 mm (the mean). For a 52 years old female with mean value of tumor thickness, Figure 4(b) shows the estimated increase in probability of dying due to malignant melanoma for those with ulceration compared to those without. Figure 4(c) indicates that men have higher probability of dying due to malignant melanoma than women with ulceration and with same mean level of tumor thickness.

5. Discussion

In this paper, we have proposed to use the semiparametric additive model for the prediction and comparison of cause-specific cumulative incidence functions for given values of the covariate. This model adds to the existing approaches for analyzing competing risks data which include multiplicative model of [6], Aalen's additive model (with only fixed effect of covariates) of [8] and Aalen's nonparametric additive model [10] among others. This gives a rich choice of modeling approaches which can all be applied when analyzing competing risks data, and one can then choose the best fit model. Most of the existing literatures study the cumulative incidence functions via modelling the cause-specific hazard functions. Direct modeling of cumulative incidence functions has recently been studied by [12]. This collection of models gives a rich variety from which a user can choose an appropriate model for analyzing the data. Under semiparametric additive hazard model, we developed statistical procedures to construct confidence intervals and bands for the difference and ratio of two cumulative incidence functions given the covariates. The proposed methods of this paper can be used to estimate the differential (or relative) probability of failure from one cause under two sets of covariate values over time. Unlike direct modeling of cumulative incidence functions, our approach is more flexible in that we do not need to assume simple forms for covariate effects on the cumulative incidence function. Thus, our approach can capture covariate effects that are nonlinear on the cumulative incidence function. Our R-package for the methods proposed is available upon request from the corresponding author and will be posted on a website.

Figure 1
Cumulative incidence function of cause 1 (solid line) at (x, z) = (1, .5) with 50 random estimates (dotted line) for sample sizes 100 and 250 with 30% censoring.
Figure 2
Cumulative time-varying effects for malignant melanoma death based on the semiparametric model with baseline, tumor thickness and ulceration as nonparametric part.

ACKNOWLEDGEMENTS

The research of the first author was in part supported by the intramural research program of National Institutes of Health and in part by NSF grants DMS-0304922 and DMS-0604576. The research of the second author was partially supported by NSF grant DMS-0604576 and NIH grant 2 RO1 AI054165-04. The research of third author was supported by the intramural research program of Eunice KennedyShriver National Institute of Child Health and Human Development. The authorship is listed in alphabetical order.

APPENDIX: Proof of Asymptotic Results

The following results state that n(F^1(tx0,z0)F1(tx0,z0)) is asymptotically equivalent to a sum of martingale U1(t|x0, z0), and it converges in distribution to a Gaussian process. Using arguments similar to those given by [1] it can be shown that

n(β^kβk)=Ck11nΣi=1nDki+op(1),

and

n(A^k(t)Ak(t))=1nΣi=1nQki(t)+op(1),

where

Qki(t)=0t(n1Wk(u))1ωki(u)xidMki(u)0tXk(u)Z(u)duCk1Dki.

Note that Mki is a martingale with respect to the counting process Nki.

Define Wk(tx0,z0)=n(Λ^k(tx0,z0)Λk(tx0,z0)). Then the process Wk(t|x0, z0) is asymptotically equivalent to Wk(t|x0, z0), since

Wk(tx0,z0)=n(x0TA^k(t)+tz0Tβ^kx0TAk(t)tz0Tβk)=x0T[n(A^k(t)Ak(t))]+tz0T[n(β^kβk)]1nΣi=1n(x0TQki(t)+tz0TCk1Dki)W~k(tx0,z0).

Note that

F^1(tx0,z0)F1(tx0,z0)=0tS(ux0,z0)d{Λ^1(ux0,z0)Λ1(ux0,z0)}+0t{S^(ux0,z0)S(ux0,z0)}dΛ^1(ux0,z0).

Using the Taylor expansion to the second component in the preceding expression,

0t{S^(ux0,z0)S(ux0,z0)}dΛ^1(ux0,z0)=0tS(ux0,z0)(Σk=1KΛ^k(ux0,z0)Σk=1KΛk(ux0,z0))dΛ^1(ux0,z0).

Then it follows that

n(F^1(tx0,z0)F1(tx0,z0))0tS(ux0,z0)dW~1(ux0,z0)Σk=1K0tS(ux0,z0)W~k(ux0,z0)dΛ1(ux0,z0).
(7)

From integration by parts,

0tS(ux0,z0)W~k(ux0,z0)dΛ1(ux0,z0)=0tW~k(ux0,z0)S(ux0,z0)dΛ1(ux0,z0)=0tW~k(ux0,z0)dF1(ux0,z0)=F1(tx0,z0)W~k(tx0,z0)0tF1(ux0,z0)dW~k(ux0,z0)=0t{F1(tx0,z0)F1(ux0,z0)}dW~k(ux0,z0)=0tF1C(t,u)dW~k(ux0,z0).

Since

W~k(tx0,z0)=1nΣi=1n(x0TQki(t)+tz0TCk1Dki),

it follows that

0tS(ux0,z0)W~k(ux0,z0)dΛ1(ux0,z0)=0tF1C(t,u)dW~k(ux0,z0)=1nΣi=1n(0tF1C(t,u)x0T(n1Wk(u))1ωki(u)xidMki(u))0tF1C(t,u)x0TXk(u)TZ(u)duCk1Dki+(0tF1C(t,u)duz0TCk1Dki).
(8)

Then from (7) and (8) n(F^1(tx0,z0)F1(tx0,z0)) can be approximated by martingale processes

U1(tx0,z0)=1nΣi=1n1i(tx0,z0),
(9)

where

1i(tx0,z0)=0tS(ux0,z0)x0T(n1W1(u))1ω1i(u)xidM1i(u)0tS(ux0,z0)x0TX1(u)Z(u)duC11D1i0tS(ux0,z0)duz0TC11D1iΣk=1K(0tF1C(t,u)x0T(n1Wk(u))1ωki(u)xidMki(u))0tF1C(t,u)x0TXk(u)Z(u)duCk1Dki(0tF1C(t,u)duz0TCk1Dki).
(10)

Under our assumption that Tki does not equal Tli for kl, k, l = 1, …, K, Nki and Nli cannot jump at the same time, and {Mki(t) : k = 1, …, K} are orthogonal martingales. An application of martingale central limit theorem [2] implies that the processes U1(t|x0, z0) converges weakly to a zero-mean Gaussian process with variance that can be consistently estimated with σ^12(tx0,z0), where

σ^12(tx0,z0)=1nΣi=1n(^1i(tx0,z0))2,
(11)

and ∈̂1i(t|x0, z0 by plugging in estimates of the unknown quantities.

REFERENCES

1. McKeague IW, Sasieni PD. A partly parametric additive risk model. Biometrika. 1994;81:501–514.
2. Andersen PK, Borgan Ø, Gill RD, Keiding N. Statistical models based on counting processes. Springer-verlag; New York: 1993.
3. Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. Wiley; New York: 1980.
4. Cox DR. Regression models and life tables (with discussion) Journal of royal statistical society, series–B. 1972;34:187–220.
5. Andersen PK, Gill RD. Cox's regression model for counting processess: A large sample study. Annals of Statistics. 1982;10:1100–1120.
6. Cheng SC, Fine JP, Wei LJ. Prediction of cumulative incidence funciton under the proportional hazards model. Biometrics. 1998;54:219–228. [PubMed]
7. Lin DY, Wei LJ, Ying Z. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika. 1993;80:557–572.
8. Shen Y, Cheng SC. Confidence Bands for Cumulative Incidence Curves under the Additive Risk Model. Biometrics. 1999;55:1093–1100. [PubMed]
9. Lin DY, Ying Z. Semiparametric analysis of the additive risk model. Biometrika. 1994;81:61–71.
10. Aalen OO, Borgan O, Fekjaer H. Covariate adjustment of event histories estimated from Markov chains: The additive approach. Biometrics. 2001;57:993–1001. [PubMed]
11. Martinussen T, Scheike TH. Dynamic Regression Models for Survival Data. Springer; New York: 2006.
12. Scheike TH, Zhang MJ, Gerds T. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:1–16. DOI: 10.1093/biomet/asm096.