Home | About | Journals | Submit | Contact Us | Français |

**|**Biometrika**|**PMC3633199

Formats

Article sections

- Abstract
- Introduction
- Estimation procedures with multivariate competing risks data
- Danish twin data
- Simulation study
- Discussion
- References

Authors

Related links

Biometrika. 2010 March; 97(1): 133–145.

PMCID: PMC3633199

Thomas H. Scheike

Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, Copenhagen DK-1014, Denmark, ; Email: ts/at/biostat.ku.dk

Yanqing Sun

Department of Mathematics and Statistics, The University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, North Carolina 28223, U.S.A., Email: yasun/at/uncc.edu

Mei-Jie Zhang

Division of Biostatistics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin 53226, U.S.A., Email: meijie/at/mcw.edu

Institute of Public Health, University of Southern Denmark, Winslowparken 17, Odense DK-5000, Denmark, ; Email: tkjensen/at/health.sdu.dk

Received 2008 October; Revised 2009 August

Copyright © 2010 Biometrika Trust

This article has been cited by other articles in PMC.

We propose a semiparametric random effects model for multivariate competing risks data when the failures of a particular type are of interest. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model. The associations between the cause-specific failure times can be studied through dependence parameters of copula functions that are allowed to depend on cluster-level covariates. A cross-odds ratio-type measure is proposed to describe the associations between cause-specific failure times, and its relationship to the dependence parameters is explored. We develop a two-stage estimation procedure where the marginal models are estimated in the first stage and the dependence parameters are estimated in the second stage. The large sample properties of the proposed estimators are derived. The proposed procedures are applied to Danish twin data to model the cumulative incidence for the age of natural menopause and to investigate the association in the onset of natural menopause between monozygotic and dizygotic twins.

This work is motivated by a dataset from the Danish twin registry (Skytthe et al., 2003) on the timing of natural menopause among female Danish twins, where natural menopause and surgical menopause are two competing events and time to menopause may be correlated within each pair of twins.

Much of the early work for analyzing competing risks data has been focused on estimation and modelling the cause-specific hazard function (Kalbfleisch & Prentice, 2002). The cumulative incidence function by contrast measures the absolute cause-specific risk, that is, the probability of having experienced failure from a particular cause over time. The effect of a covariate on the cause-specific hazard function and cumulative incidence function can be very different. The two measures provide different perspectives for cause-specific failure times; see Fine & Gray (1999), Katsahian et al. (2006), Latouche et al. (2007) and Beyersmann & Schumacher (2007).

Gray (1988) and Fine & Gray (1999) proposed a proportional hazards model for the subdistribution hazard to model the cumulative incidence function directly. A general semiparametric model for the cumulative incidence function is studied by Scheike et al. (2008), where some covariates have time-varying effects and others have constant effects. Scheike & Zhang (2008) proposed some goodness-of-fit procedures for model checking for these semiparametric models. Here we extend this work to deal with clusters by providing robust standard errors, and thereby also extend the recent work by Chen et al. (2008) that considered the *k*-sample case.

Multivariate failure time data analysis has received much attention in the literature. Dabrowska (1988) proposed a nonparametric estimator for the multivariate survival function. Gill (1993) proposed the concept of the iterated odds ratio to model dependence between failure times. The marginal model approach of Wei et al. (1989) fits univariate models to the marginal distributions without modelling the within-cluster dependence. The efficiency of the marginal approach can be improved by introducing weights that account for failure time dependencies (Cai & Prentice, 1995). A frailty model approach, on the other hand, attempts to model this dependence by postulating a cluster-specific frailty to account for each cluster’s particular susceptibility to failures; see Clayton (1978), Oakes (1982, 1989), Clayton & Cuzick (1985) and Glidden (2000), among others.

There has been little work on developing statistical methods for correlated cause-specific failure times in the competing risks setting. Recently, Bandeen-Roche & Ning (2008) and Bandeen-Roche & Liang (2002) introduced a nonparametric cause-specific cross hazard ratio association measure for multivariate cause-specific failure times; see also Cheng & Fine (2008). Cheng et al. (2007) studied nonparametric estimation of bivariate cause-specific hazard functions and bivariate cumulative incidence functions. They also proposed two nonparametric association measures in terms of the bivariate cause-specific hazard functions and bivariate cumulative incidence functions. These methods do not deal with covariates and require specification of joint cause-specific intensities for all cause combinations. It may be difficult to apply these measures to cause-specific failure time regressions.

To analyze clustered competing risks data, Katsahian et al. (2006) proposed a frailty model for the subdistribution hazard by multiplying lognormal random cluster effects to the proportional subdistribution hazard model. This model allows subject-specific regression effects. One drawback of this approach is that the frailty parameter can be identified solely from the marginal models and that the marginal models depend on the distribution of the cluster-specific effects.

We suggest an alternative approach to avoid this problem by working on a new semiparametric random effects model. Under this model, the marginal cumulative incidence functions follow a generalized semiparametric additive model that is independent of the frailty parameters and frailties account only for correlations between competing risks failure times within clusters. The frailty variance will thus reflect solely the amount of variation due to clusters. The concept of cross-odds ratio is introduced to measure the degree of association between two cause-specific failure times within the same cluster.

Let *K* be the number of clusters and *n _{k}* the number of subjects within the

Let *θ _{k}* be independently distributed random variables. Assume that the subjects within clusters are independent conditional on

(1)

where *η*(*t*) is a (*p* + 1)-dimensional vector of regression functions, *γ* is a *q*-dimensional vector of parameters, Ψ_{νk}(*t*) = *E*_{θk}{exp(−*θ _{k}t*) |

(2)

where *α* is an *r*-dimensional vector of parameters and *Q _{k}* is a regression design vector.

Under (1), the marginal cumulative incidence function, *P*_{1}(*t* | *x _{ki}*,

(3)

The joint survival function of the cause-specific failure times has an Archimedean copula function representation,

where
. Various frailty distributions for *θ _{k}* can be used to describe correlations among members of each cluster. When the conditional distribution of

The random effects model (1) implies the semiparametric additive model (3) for the marginal cumulative incidence functions. This suggests that model (1) can be estimated through a two-stage procedure, where the parameters of the marginal models are estimated in the first stage and the dependence parameter is estimated in the second stage. Ignoring the dependence between the failure times, the parameters of the marginal models can be estimated by modifying the procedure of Scheike et al. (2008) for univariate competing risks data. This resembles the marginal approach of Wei et al. (1989).

Let *G _{c}*(

We consider estimating equations based on Δ* _{ki}N_{ki}* (

Let
be the *n _{k}* × 1 vector of

Following Scheike et al. (2008), the marginal regression functions *η*(*t*) and regression parameters *γ* can be estimated based on the following estimating equations:

(4)

(5)

where *W*(*t*) = diag{*W*^{(}^{k}^{)}(*t*)}, *R*(*t*) = [{*R*^{(1)}(*t*)}^{T}, . . . , {*R*^{(}^{K}^{)}(*t*)}^{T}]^{T}, and

The estimating equations (4) and (5) can be solved by an iterative procedure similar to that given in Scheike et al. (2008); see the Appendix. The procedure can be implemented using standard software for generalized linear models. However, the standard errors obtained from such software are conservative. Correct estimation of the standard errors is described in the following.

Suppose that there exists a constant *N*_{0} such that *n _{k}* <

respectively, for which the formula for *I _{γ}*,

Let *Î _{γ}*,

where *a*^{2} = *aa*^{T} for a column vector *a*. These estimators can be used to construct confidence intervals for *γ* and *η*(*t*).

The Gaussian multiplier resampling method can be used to construct simultaneous confidence intervals for *η*(*t*) over a fixed time interval and make inferences. By application of empirical process theory (van der Vaart, 1998),
converges weakly to the same mean zero Gaussian limit processes as
, given the observed data, where *G*_{1}, . . . , *G _{K}* are independent standard normal random variables. We refer to Lemma 1 of Sun & Wu (2005) for a result that applies directly to this setting. Hence a (1 −

Let

(6)

where *i*, *j* are two subjects within cluster *k*. It follows that, for *i* ≠ *j*,

(7)

where *ν _{k}* =

Let *H _{k}* denote the index set for the

(8)

where *D _{α,kij}*(

We show in the Appendix that has the following approximation:

for which the formulae for *W _{α,k}*(

where *Î _{α}* and

There seems to be no simple direct extension of the association measures for multivariate failure times to the competing risks data. The popular Kendall’s tau-like interpretation of dependence parameter is no longer applicable to multivariate cause-specific failure times since cumulative incidence functions are in general not distribution functions.

We introduce a cross-odds ratio to describe the dependence between cause-specific failure times. Consider the case of bivariate failure times (*T _{ki}*,

(9)

where (*A*)* ^{c}* means the event complementary to

In the presence of covariates, the conditional cross-odds ratio given the covariates, *π _{i}*

When the random effects *θ _{k}* follow a gamma distribution with mean 1 and variance

We identified 503 pairs of female twins born between 1931 and 1952 through the Danish Twin Registry. There are 269 monozygotic twin pairs and 234 dizygotic twin pairs identified by Skytthe et al. (2003). Before they reach natural menopause, some women may experience surgical menopause as a result of removal of the uterus, cervix or ovaries and others may be treated with hormones that affect the timing of menopause. Surgical menopause and hormonal treatment are therefore competing risks for natural menopause. Of 1006 twins, 416 twins experienced natural menopause, 57 twins had surgical related menopause, 183 twins underwent hormonal treatments before natural menopause, 246 twins had censored event times. There were 104 twins with unknown failure types or censoring status and they were excluded from the analysis. These summaries do not reflect that the probabilities of different events changed considerably over the time cohorts. We partition the data into four cohorts of almost equal size according to birthdate. Cohort 1 from 1931–1936 has 120 twin pairs, cohort 2 from 1937–1942 has 123 twin pairs, cohort 3 from 1943–1947 has 121 twin pairs, and cohort 4 from 1948–1952 has 139 twin pairs. In cohort 1, there was zero censoring, 154 natural menopause, 22 surgical menopause, 40 hormonal treatments, and 24 cases with unknown failure types or censoring status. These numbers were (3, 137, 13, 64, 29) for cohort 2, (65, 89, 11, 54, 23) for cohort 3, and (178, 36, 11, 25, 28) for cohort 4. In the analysis, we allow the censoring distribution to depend on cohorts.

We first estimate the marginal cumulative incidence of natural menopause excluding the cohort structure. Let *x _{ki}* = {1,

Estimated probability of natural menopause in Danish twins data. The estimated marginal cumulative incidence functions for monozygotic twins are shown in panel (a) and the upper curves in panel (b) are for cohort 1 and the lower curves are for cohort **...**

To examine whether the cumulative incidence of natural menopause changes over the different cohorts, we estimate the cumulative incidence function for natural menopause by adjusting for time cohort effects. Let *I _{ki}* (cohort 2) be the cohort indicator of whether the twins in the

We now examine whether there is an association in the occurrence of natural menopause between twins of the same family under the random effects model (1). To allow for different degrees of association for different types of twins, we let *ν _{k}* =

The joint probability of both twins experiencing natural menopause by age *t* depends on the degree of association between the twins. This probability is *v _{kij}*(

Finally, we calculate the cross-odds ratio to show how the onset of natural menopause for one twin affects the probability of the onset for the other twin. The cross-odds ratios for monozygotic twins are around 2. This indicates that, if one twin has experienced natural menopause, the odds of the second twin experiencing natural menopause by the same age essentially doubles the unconditional odds of natural menopause for the second twin. For the dizygotic twins, we find a cross-odds ratio that lies around 1.4.

We conducted a small simulation study for model (1) to evaluate small sample performance of the proposed method. We take *K* = 100 and 200 pairs and let the random effects follow a Gamma distribution with mean 1 and variance *ν*. We consider a situation with one covariate that is uniformly distributed on [0, 1]. The baseline function is set as *η*(*t*) = 0.5*t* and the regression coefficient *γ* = 0.5. The censoring times are generated from the uniform distribution on [0, 2]. The probability of observing the cause of interest is 0.42, which is comparable to the Danish twin dataset where natural menopause occurred in about 40% of the twins.

Table 1 summarizes the simulation results for and based on 1000 realizations for *γ* = 0.5 and *ν* equal to 2, 1 and 0.5. The estimator performs satisfactorily at all levels of the random effect variance *ν* and for both sample sizes. The estimator is essentially unbiased and its variance is well estimated by the robust variance estimator. The biases of the estimator for *ν* are reasonably small, and the estimator for the variance of perform well. The simulations also find the baseline function well approximated by its estimator, which are not reported here.

The random effects model (1) is estimated through a two-stage approach where the marginal parameters are estimated separately from the dependence parameters. Additional efficiency can be gained by jointly estimating both the marginal parameters and the dependence parameters and by using all the joint observations of the inverse probability of censoring weighted responses for the dependence parameters. This issue deserves further attention and investigation.

The asymptotic distribution of _{i}_{|}* _{j}* (

The research was partially supported by grants from the National Cancer Institute to Scheike and Zhang, and by grants from the National Science Foundation and the National Institutes of Health to Sun. Thomas Scheike was also supported by a grant from the Danish Research Council. We appreciate the valuable comments of the reviewers and the editor, which have improved the content and presentation of this paper.

The estimating equations (4) and (5) can be solved simultaneously in a way similar to Scheike et al. (2008). By the Taylor expansion of *P*_{1}(*t*, *η*, *γ*) around the current value of the estimates (^{(}^{m}^{)}(*t*), ^{(}^{m}^{)}), we have

Let , and . We have the following approximated estimating equations:

(A1)

(A2)

Solving the equations (A1) and (A2) for *γ* and *η*(*t*), the (*m* + 1)th iterative estimators are given by
and
, where

(A3)

Let *D _{η}*(

Let
,
,
, {*G _{c}*(

where *Y*(*s*) = ∑* _{k,i}Y_{ki}* (

(A4)

where
. Let *Î _{γ}* and

Similarly, by using a Taylor expansion, we have

(A5)

where

Both *I _{η}*(

To establish the asymptotic distribution for , we consider the decomposition

(A6)

By a Taylor approximation,

Let . It follows from equations (A4), (A5), (A6), (A7) and (A8) that

where

and *I _{η}*(

- Bandeen-Roche K, Liang K-Y. Modelling multivariate failure time associations in the presence of a competing risk. Biometrika. 2002;89:299–314.
- Bandeen-Roche K, Ning J. Nonparametric estimation of bivariate failure time associations in the presence of a competing risk. Biometrika. 2008;95:221–32. [PMC free article] [PubMed]
- Beyersmann J, Schumacher M. Letter to the editor: misspecified regression model for the subdistribution hazard of a competing risk. Statist Med. 2007;26:1649–52. [PubMed]
- Cai J, Prentice RL. Estimating equations for hazard ratio parameters based on correlated failure time data. Biometrika. 1995;82:151–64.
- Chen BE, Kramer JL, Greene MH, Rosenberg PS. Competing risks analysis of correlated failure time data. Biometrics. 2008;64:172–79. [PMC free article] [PubMed]
- Cheng Y, Fine J. Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data. Biometrika. 2008;95:233–40.
- Cheng Y, Fine JP, Kosorok MR. Nonparametric association analysis of bivariate competing-risks data. J Am Statist Assoc. 2007;102:1407–15.
- Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–51.
- Clayton DG, Cuzick J. Multivariate generalisations of the proportional hazards model. J. R. Statist. Soc A. 1985;148:82–117.
- Dabrowska DM. Kaplan–Meier estimate on the plane. Ann Statist. 1988;16:1475–89.
- Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Statist Assoc. 1999;94:496–509.
- Frank MJ. On the simultaneous associativity of f(x, y) and x + y − f(x, y) Aequationes Mathematicae. 1979;19:194–226.
- Gill RD. Multivariate survival analysis. Theory Prob Appl. 1993;37:18–31.
- Glidden D. A two-stage estimator of the dependence parameter for the Clayton-Oakes model. Lifetime Data Anal. 2000;6:141–56. [PubMed]
- Gray RJ. A class of
*k*-sample tests for comparing the cumulative incidence of a competing risk. Ann Statist. 1988;16:1141–54. - Gumbel EJ. Bivariate exponential distributions. J Am Statist Assoc. 1960;55:698–707.
- Hougaard P. Analysis of Multivariate Survival Data. New York: Springer; 2000.
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: Wiley; 2002.
- Katsahian S, Resche-Rigon M, Chevret S, Porcher R. Analysing multicentre competing risks data with a mixed proportional hazards model for the subdistribution. Statist Med. 2006;25:4267–78. [PubMed]
- Latouche A, Boisson V, Chevret S, Porcher R. Misspecified regression model for the subdistribution hazard of a competing risk. Statist Med. 2007;26:965–74. [PubMed]
- Oakes D. A model for association in bivariate survival data. J. R. Statist. Soc B. 1982;44:414–22.
- Oakes D. Bivariate survival models induced by frailties. J Am Statist Assoc. 1989;84:487–93.
- Scheike TH, Zhang M-J. Flexible competing risks regression modelling and goodness-of-fit. Lifetime Data Anal. 2008;14:464–83. [PMC free article] [PubMed]
- Scheike TH, Zhang M-J, Gerds T. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–20.
- Skytthe A, Pedersen N, Kaprio J, Stazi M, Hjelmborg J, Iachine I, Vaupel J, Christensen K. Longevity studies in genomeutwin. Twin Res. 2003;6:448–54. [PubMed]
- Sun Y, Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scand J Statist. 2005;32:21–47.
- van der Vaart AW. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998.
- Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. J Am Statist Assoc. 1989;84:1065–73.

Articles from Biometrika are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |