Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2891314

Formats

Article sections

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 December 21.

Published in final edited form as:

PMCID: PMC2891314

NIHMSID: NIHMS159605

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

In genetic family studies, ages at onset of diseases are routinely collected. Often one is interested in assessing the familial association of ages at onset of a certain disease type. However, when a competing risk is present and is related to the disease of interest, the usual measure of association by treating the competing event as an independent censoring event is biased. We propose a bivariate model that incorporates two types of association: one is between the first event time of paired members, and the other is between the failure types given the first event time. We consider flexible measures for both types of association, and estimate the corresponding association parameters by adopting the two-stage estimation of Shih and Louis (1995) and Nan et al. (2006). The proposed method is illustrated using the kinship data from the Washington Ashkenazi Study.

In family studies, often one is interested in assessing the familial association in diseases with ages at onset. When only one disease type is considered (e.g. cancer), there exists an extensive literature for the analysis of correlated failure time data. These methods can be used to model and estimate the association between ages at onset of family members (Hougaard, 2001). For example, Shih and Louis (1995) proposed a two-stage estimation procedure for estimating the association of paired failure times. Hougaard et al. (1992) proposed bivariate survival models which were used to measure the similarities between the lifetimes of adult Danish twins. Li et al. (1998) proposed a parametric likelihood approach to study the familial aggregation in lung cancer risk from case-control family studies. When the competing risk is present, all the above mentioned methods censor the failure of interest at the time of the competing event and proceed with the proposed modeling and analysis of correlated failure time data. Such censoring implicitly assumes that the competing events are independent and that the estimate of the marginal distribution for the event of interest (e.g. the Kaplan-Meier estimate) is consistent (Kalbfleish and Prentice, 2002). Bandeen-Roche and Liang (2002), Chatterjee et al. (2003), and Bandeen-Roche and Ning (2008), showed that censored observations due to a competing risk may affect the estimation of association of correlated failure times of the event of interest. It is well recognized that if the competing events are dependent, the marginal distribution of the failure time for the event of interest is not identifiable, because only the failure time of the first event is observable. Instead of the marginal distribution, the univariate cause-specific hazard (Prentice et al.,1978) and incidence function are observable quantities, and have been used in making cause-specific inference (Kalbfleish and Prentice, 2002). Bandeen-Roche and Liang (2002) extended the concept of univariate cause-specific hazard to a bivariate setting. They proposed a cause-specific cross-ratio which accommodates competing risks in measuring the association of paired failure times. Recently Bandeen-Roche and Ning (2008) proposed a nonparametric estimation method to estimate the constant cause-specific cross-ratio. Cheng and Fine (2008) gave an alternative representation of the cause-specific cross-ratio, and proposed a simple plug-in nonparametric estimate.

In this article, we are interested in assessing the familial association of disease in the presence of competing risk of the kinship data collected from the Washington Ashkenazi Study (Struewing et al., 1997). In this study, more than 5000 volunteer Ashkenazi Jews living in Washington D.C. provided blood samples for genotyping of BRCA1/BRCA2 mutations. They also gave family history information on cancers and mortality. One primary interest of the study was to estimate the breast cancer risk among the carriers and non-carriers of the gene mutations. For our application, we consider female first-degree relatives of the volunteer participants, and are interested in studying their familial association of ages at onset of cancer and non-cancer mortality. Here cancer and non-cancer mortality are competing risks. To this end, we propose a bivariate survival model for correlated failure times of relatives which incorporates competing risks. The proposed model is decomposed into two parts: one based on the time to first event, and the other is based on the event type. For the former, we generalize the bivariate survival model of Clayton (1978) to one which allows for piecewise constant cross ratios. For the latter, we generalize the bivariate logistic model to one with piecewise constant odds ratios. Different from the methods of Bandeen-Roche and Ning (2008) and Cheng and Fine (2008) which treat each bivariate failure type one at a time, we incorporate all possible bivariate failure types, namely both cancers, both non-cancer deaths, and one cancer and one non-cancer death, to model the association related to failure types. In addition, since we separately model the association of times to first event and failure types, we are able to assess the effects of these two types of associations on the cause-specific cross-ratio. One important feature of our model is that even though both cross-ratio and odds-ratio are piecewise constant, the cause-specific cross-ratio generally is non-piecewise constant.

The remainder of the paper is organized as follows. In Section 2 we propose bivariate survival models for the association of correlated failure times and in the presence of competing risks. In Section 3, we propose a quasi-likelihood estimation procedure for familial association. In Section 4 and 5, we use simulations and the WAS study data to illustrate the proposed methodology. A discussion follows in Section 6.

Consider two competing events. Under the usual assumption of competing risks, for any given subject, only the first event is observable. For each individual, in the absence of censoring, the observable data is time to the first event and failure type. For a pair of individuals, *j* = 1, 2, let *T _{j}* define the time to the first event for individual

$${\lambda}_{j}^{k}(t)=\underset{\mathrm{\Delta}t\to 0}{\text{lim}}\mathrm{\Delta}{t}^{-1}\mathit{\text{Pr}}(t\le {T}_{j}<t+\mathrm{\Delta}t,{Y}_{j}=k|{T}_{j}\ge t)$$

equals ${f}_{j}^{k}(t)/{S}_{j}(t)$. Then ${f}_{j}^{k}(t)$ equals ${f}_{j}(t){\lambda}_{j}^{k}(t)/{\lambda}_{j}(t)$, where ${\lambda}_{j}(t)={\displaystyle {\sum}_{k}{\lambda}_{j}^{k}}(t)$ is the hazard function of *T _{j}*. In the case of only one event type, ${\lambda}_{j}(t)={\lambda}_{j}^{1}(t)$. For convenience, we write ${p}_{j}^{k}(t)\phantom{\rule{thinmathspace}{0ex}}\text{for}\phantom{\rule{thinmathspace}{0ex}}{\lambda}_{j}^{k}(t)/{\lambda}_{j}(t)$ to denote the probability of the event being type

The next stage of the model involves specifying a dependency structure between the members of the pair. We consider two types of association: one is between the first event times (*T*_{1} and *T*_{2}), and the other is between the failure types (*Y*_{1} and *Y*_{2}) given (*T*_{1}, *T*_{2}). For the measure of association between *T*_{1} and *T*_{2}, we consider the cross ratio (Oakes, 1989) defined by

$$\begin{array}{cc}\theta ({t}_{1},{t}_{2})\hfill & =\frac{{\lambda}_{1}({t}_{1}|{T}_{2}={t}_{2})}{{\lambda}_{1}({t}_{1}|{T}_{2}>{t}_{2})}\hfill \\ \hfill & =\frac{S({t}_{1},{t}_{2})f({t}_{1},{t}_{2})}{\frac{\partial}{\partial {s}_{1}}S({s}_{1},{t}_{1})\phantom{\rule{thinmathspace}{0ex}}|{s}_{1}={t}_{1}\frac{\partial}{\partial {s}_{2}}S({t}_{1},{s}_{2})\phantom{\rule{thinmathspace}{0ex}}|{s}_{2}={t}_{2}},\hfill \end{array}$$

(1)

where *S* is the joint survival function of *T*_{1} and *T*_{2}, and *f* is the joint density function. Most research has imposed models or constraints on θ(*t*_{1}, *t*_{2}) such as θ(*t*_{1}, *t*_{2}) is constant or *S* belongs to the archimedean copulas (Oakes, 1989). Nan et al. (2006) proposed a piecewise constant cross-ratio model which allows θ(*t*_{1}, *t*_{2}) to be piecewise constant in one dimension of time. For our application, since each pair consists of first-degree relatives to each other and their event times are paired in an arbitrary order, it is more reasonable to allow the cross ratios to vary in both time dimensions. Therefore, in this work, we model the cross-ratio as a piecewise constant function of the event times of both members in the pair. Specifically the cross-ratio is modelled as $\theta ({t}_{1},{t}_{2})={\displaystyle {\sum}_{i=1}^{K}{\displaystyle {\sum}_{j=1}^{K}{\theta}_{\mathit{\text{ij}}}I({w}_{i-1}\le {t}_{1}<{w}_{i})}I({w}_{j-1}\le {t}_{2}<{w}_{j})}$, where 0 = *w*_{0} < *w*_{1} < < *w _{K}* are a set of pre-specified knots in the appropriate age range of interest. Extending the work of Nan et al. (2006) which assumes θ changes only in one dimension of time, it follows that under the assumption that θ(

$${S}_{{A}_{\mathit{\text{ij}}}}({t}_{1},{t}_{2})={[{S}_{{A}_{\mathit{\text{ij}}}}{({t}_{1},{w}_{j-1})}^{-({\theta}_{\mathit{\text{ij}}}-1)}+{S}_{{A}_{\mathit{\text{ij}}}}{({w}_{i-1},{t}_{2})}^{-({\theta}_{\mathit{\text{ij}}}-1)}-{S}_{{A}_{\mathit{\text{ij}}}}{({w}_{i-1},{w}_{j-1})}^{-({\theta}_{\mathit{\text{ij}}}-1)}]}^{-1/({\theta}_{\mathit{\text{ij}}}-1)}.$$

(2)

When θ(*t*) θ for all *t*_{1} > 0, *t*_{2} > 0, the above bivariate survival function is equivalent to the Clayton model (1978). When θ(*t*_{1}, *t*_{2}) 1, *T*_{1} and *T*_{2} are independent, and *S*(*t*_{1}, *t*_{2}) = *S*_{1}(*t*_{1})*S*_{2}(*t*_{2}). In the appendix, we show the bivariate survival function *S* is determined by the marginal survival functions *S*_{1}, *S*_{2} and θ_{ij}’*s*.

For the measure of association for failure types *Y*_{1} and *Y*_{2} given the event times *T*_{1}, *T*_{2}, we consider the odds ratio defined by

$$\varphi ({t}_{1},{t}_{2})=\frac{{p}^{22}({t}_{1},{t}_{2}){p}^{11}({t}_{1},{t}_{2})}{{p}^{12}({t}_{1},{t}_{2}){p}^{21}({t}_{1},{t}_{2})},$$

where *p*^{k1k2}(*t*_{1}, *t*_{2}) = *Pr*(*Y*_{1} = *k*_{1}, *Y*_{2} = *k*_{2}|*T*_{1} = *t*_{1}, *T*_{2} = *t*_{2}) = *f*^{k1k2}(*t*_{1}, *t*_{2})/*f*(*t*_{1}, *t*_{2}) with *f*^{k1k2}(*t*_{1}, *t*_{2}) = *f*(*t*_{1}, *t*_{2}, *Y*_{1} = *k*_{1}, *Y*_{2} = *k*_{2}).

Let ${p}_{j}^{k}({t}_{1},{t}_{2})$ denote the conditional probability *Pr*(*Y _{j}* =

$${p}^{{k}_{1}{k}_{2}}({t}_{1},{t}_{2};\varphi )=\{\begin{array}{c}[{h}^{{k}_{1}{k}_{2}}({t}_{1},{t}_{2})-\{{h}^{{k}_{1}{k}_{2}}{({t}_{1},{t}_{2})}^{2}-4\varphi (\varphi -1){({p}_{1}^{{k}_{1}}({t}_{1},{t}_{2}){p}_{2}^{{k}_{2}}({t}_{1},{t}_{2}))\}}^{0.5}]/(2(\varphi -1))\hfill \\ \text{if}\phantom{\rule{thinmathspace}{0ex}}\varphi \ne 1\hfill \\ {p}_{1}^{{k}_{1}}({t}_{1},{t}_{2}){p}_{2}^{{k}_{2}}({t}_{1},{t}_{2})\phantom{\rule{thinmathspace}{0ex}}\text{otherwise},\hfill \end{array}$$

(3)

where ${h}^{{k}_{1}{k}_{2}}({t}_{1},{t}_{2})=1-(1-\varphi )\{{p}_{1}^{{k}_{1}}({t}_{1},{t}_{2})+{p}_{2}^{{k}_{2}}({t}_{1},{t}_{2})\}$. In order to allow for flexibility, we will model the odds ratio as a piecewise constant function of the event times, expressed by $\varphi ({t}_{1},{t}_{2})={\displaystyle {\sum}_{i=1}^{K}{\displaystyle {\sum}_{j=1}^{K}{\varphi}_{\mathit{\text{ij}}}I({w}_{i-1}\le {t}_{1}<{w}_{i})}I({w}_{j-1}\le {t}_{2}<{w}_{j})}$,

The cause-specific bivariate distribution of event times can be characterized with the above marginal distributions and two types of association θ and ϕ. Specifically, we can combine these two types of association to derive cause-specific time-dependent measures of association of practical interest. It may be more meaningful to evaluate the association with the cause-specific cross-ratio (Bandeen-Roche and Liang, 2002), since the above cross-ratios θ_{ij}*s* are associated with times to first failure and provide little practical interest. For a given (*t*_{1}, *t*_{2}), the cause-specific cross-ratio is defined by

$${\theta}^{{k}_{1}{k}_{2}}({t}_{1},{t}_{2})=\frac{{\lambda}_{1}^{{k}_{1}}({t}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{2}={t}_{2},{Y}_{2}={k}_{2})}{{\lambda}_{1}^{{k}_{1}}({t}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{2}>{t}_{2})},$$

where ${\lambda}_{1}^{{k}_{1}}({t}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{2}={t}_{2},{Y}_{2}={k}_{2})$ is the conditional hazard at time *t*_{1} with failure type *k*_{1} given the other pair member has an event at time *t*_{2} with failure type *k*_{2} and ${\lambda}_{1}^{{k}_{1}}({t}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{2}>{t}_{2})$ is the conditional hazard at time *t*_{1} with failure type *k*_{1} given the other pair member has survived time *t*_{2}. It is straightforward to show that θ^{k1k2}(*t*_{1}, *t*_{2}) can be alternatively represented by

$${\theta}^{{k}_{1}{k}_{2}}({t}_{1},{t}_{2})=\theta ({t}_{1},{t}_{2})\frac{{p}^{{k}_{1},{k}_{2}}({t}_{1},{t}_{2})}{\mathit{\text{Pr}}({Y}_{1}={k}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{1}={t}_{1},{T}_{2}>\phantom{\rule{thinmathspace}{0ex}}{t}_{2})\mathit{\text{Pr}}({Y}_{2}={k}_{2}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{2}={t}_{2},{T}_{1}>\phantom{\rule{thinmathspace}{0ex}}{t}_{1})}.$$

(4)

Thus θ^{k1k2}(*t*) is decomposed into the product of two terms: the first term is the overall cross-ratio, and the second term provides additional contribution to the dependency of cause-specific event times arising from the dependency of failure types between correlated members. Note that according to (4), even if θ(*t*_{1}, *t*_{2}) and ϕ(*t*_{1}, *t*_{2}) are both piecewise constant, θ^{k1k2}(*t*_{1}, *t*_{2}) may not be.

Although the cause-specific cross-ratio is a meaningful association measure, it is expressed as an instantaneous odds-ratio which may be difficult to interpret for practitioners. We consider another function, namely the conditional cumulative incidence defined as *Pr*(*T*_{1} <= *t*_{1}, *Y*_{1} = *k*_{1} | a ≤ *T*_{2} < b, *Y*_{2} = *k*_{2}), which is simpler to interpret, since it simply measures the cumulative cause-specific incidence of one pair member given the other member’s failure time and failure type. The impact of the other pair member’s failure time and failure type on the cumulative incidence can be assessed by comparing the conditional cumulative incidence with its marginal cumulative incidence counterpart.

Let *T _{ij}* and

We first consider estimating the piecewise constant cross-ratios θ_{ij}s by adopting the two-stage estimation method of Shih and Louis (1995) and sequential two-stage method of Nan et al. (2006). For estimation of each θ_{lm}, we use the subset *D _{lm}* of paired data {

In the following, we describe how to estimate θ_{11}. At the first stage we estimate the common marginal survival function by the Nelson estimate ignoring dependency between failure times in the same family. The Nelson estimate specifies the probabilities for the bottom boundaries of *A*_{1j}, *j* = 1, , *K* and left boundaries of *A*_{k1}, *k* = 1, ,*K*. At the second stage, we treat the estimated probabilities at the bottom and left boundaries of *A*_{11} as known, and use data set *D*_{11} to obtain the estimate of θ_{11} which maximizes the following composite-likelihood with *l* = 1, *m* = 1

$$\begin{array}{cc}L({\theta}_{\mathit{\text{lm}}})=\hfill & {\displaystyle \prod _{i=1}^{n}{\displaystyle \prod _{(\mathit{\text{ip}},\mathit{\text{iq}})\in {\mathcal{D}}_{\mathit{\text{lm}}}\cup {\mathcal{D}}_{\mathit{\text{ml}}}}^{{M}_{i}}C{({u}_{\mathit{\text{ip}}}^{\mathit{\text{lm}}},{v}_{\mathit{\text{iq}}}^{\mathit{\text{lm}}};{\theta}_{\mathit{\text{lm}}})}^{(1-{\tilde{\delta}}_{\mathit{\text{ip}}})(1-{\tilde{\delta}}_{\mathit{\text{iq}}})}{\left\{\frac{-\partial C({u}_{\mathit{\text{ip}}}^{\mathit{\text{lm}}},{v}_{\mathit{\text{iq}}}^{\mathit{\text{lm}}};{\theta}_{\mathit{\text{lm}}})}{\partial {u}_{\mathit{\text{ip}}}}\right\}}^{{\tilde{\delta}}_{\mathit{\text{ip}}}(1-{\tilde{\delta}}_{\mathit{\text{iq}}})}}}\hfill \\ \hfill & \times {\left\{\frac{-\partial C({u}_{\mathit{\text{ip}}}^{\mathit{\text{lm}}},{v}_{\mathit{\text{iq}}}^{\mathit{\text{lm}}};{\theta}_{\mathit{\text{lm}}})}{\partial {u}_{\mathit{\text{iq}}}}\right\}}^{{\tilde{\delta}}_{\mathit{\text{iq}}}(1-{\tilde{\delta}}_{\mathit{\text{ip}}})}{\left\{\frac{-{\partial}^{2}C({u}_{\mathit{\text{ip}}}^{\mathit{\text{lm}}},{v}_{\mathit{\text{iq}}}^{\mathit{\text{lm}}};{\theta}_{\mathit{\text{lm}}})}{\partial {u}_{\mathit{\text{ip}}}\partial {u}_{\mathit{\text{iq}}}}\right\}}^{{\tilde{\delta}}_{\mathit{\text{ip}}}{\tilde{\delta}}_{\mathit{\text{iq}}}},\hfill \end{array}$$

(5)

where * _{ip}* is the collection of pairs in data set

Assume that the marginal survival function is continuous and *Pr*(*w*_{i−1} ≤ *T*_{1} < *w _{i}*,

To estimate the association of failure types between paired members, only paired data with failure observed in both members are used, because pairs with censored observations contain no information about the correlation. Specifically, for the estimation of each ϕ_{lm}, *l* = 1, , *K*, *m* = 1, , *K*, we use the subset *D̄ _{lm}* of paired data {

$$\begin{array}{cc}L({\varphi}_{\mathit{\text{lm}}})=\hfill & {\displaystyle \prod _{i=1}^{n}{\displaystyle \prod _{(\mathit{\text{ip}},\mathit{\text{iq}})\in {\overline{\mathcal{D}}}_{\mathit{\text{lm}}}\cup {\overline{\mathcal{D}}}_{\mathit{\text{ml}}}}^{{M}_{i}}{\tilde{p}}^{11}{({X}_{\mathit{\text{ip}}},{X}_{\mathit{\text{iq}}};{\varphi}_{\mathit{\text{lm}}})}^{(2-{y}_{\mathit{\text{ip}}})(2-{y}_{\mathit{\text{iq}}})}{\tilde{p}}^{12}{({X}_{\mathit{\text{ip}}},{X}_{\mathit{\text{iq}}};{\varphi}_{\mathit{\text{lm}}})}^{(2-{y}_{\mathit{\text{ip}}})({y}_{\mathit{\text{iq}}}-1)}}}\hfill \\ \hfill & \times \phantom{\rule{thinmathspace}{0ex}}{\tilde{p}}^{21}{({X}_{\mathit{\text{ip}}},{X}_{\mathit{\text{iq}}};{\varphi}_{\mathit{\text{lm}}})}^{({y}_{\mathit{\text{ip}}}-1)(2-{y}_{\mathit{\text{iq}}})}{\tilde{p}}^{22}{({X}_{\mathit{\text{ip}}},{X}_{\mathit{\text{iq}}};{\varphi}_{\mathit{\text{lm}}})}^{({y}_{\mathit{\text{ip}}}-1)({y}_{\mathit{\text{iq}}}-1)},\hfill \end{array}$$

(6)

where * _{ij}* is the collection of pairs in data set

In addition to the assumptions required to establish the consistency for the estimate of θ_{ij}, assume that *Pr*(*Y*_{1} = *k*_{1}, *Y*_{2} = *k*_{2} | *T*_{1} = *t*_{1}, *T*_{2} = *t*_{2}) > 0, (*t*_{1}, *t*_{2}) *A _{ij}* for all

With the above complex sequential estimation procedure, further work is needed to derive the asymptotic properties of the estimates of θ_{ij}’s and ϕ_{ij}. In this paper, we use the bootstrap approach (Efron and Tibshirani, 1993) with family as the bootstrap sampling units to obtain the variance of these semi-parametric estimates.

After these parameters are estimated, we are able to plug them in (2) to obtain the estimate of the cause-specific cross-ratios. We can also calculate the estimate of the conditional cumulative incidence function *Pr*(*T*_{1} ≤ *t*_{1}, *Y*_{1} = *k*_{1} | *a* ≤ *T*_{2} < *b*, *Y*_{2} = *k*_{2}), *k _{j}* = 1, 2,

$$\widehat{P}r({T}_{1}\le {t}_{1},{Y}_{1}={k}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}a\le {T}_{2}<b,{Y}_{2}={k}_{2})=\frac{{\displaystyle {\int}_{0}^{{t}_{1}}}{\displaystyle {\int}_{{a}^{-}}^{b}}{\tilde{p}}^{{k}_{1}{k}_{2}}({u}_{1},{u}_{2};\widehat{\varphi}({u}_{1},u2)){\widehat{S}}_{{A}_{{i}_{{u}_{1}}{i}_{{u}_{2}}}}({\mathit{\text{du}}}_{1},{\mathit{\text{du}}}_{2})}{{\displaystyle {\int}_{a}^{b}}{\widehat{f}}^{{k}_{2}}(u)\mathit{\text{du}}},$$

where *Ŝ* is the semi-parametric estimate, the subscript *i _{t}* =

We applied the proposed model and estimation method to the WAS study data. We analyzed a subset of the data in which the first-degree relatives of the female probands are also first-degree relatives to each other. That is, (mother, sister), (daughter, daughter) and (sister, sister) pairs were included in the analysis. The subset was chosen because the pairwise association should be similar among these pairs of first-degree relatives. Cancer and non-cancer mortality are the two competing events considered in this example. Among these women, the majority of the cancer incidences were breast and ovarian tumors. The ages of the relatives at the time of the interview of the proband define the censoring times. The data of 12,255 subjects coming from 4,235 distinct families were used to obtain the Nelson-type nonparametric estimate of the marginal survival of time to the first failure, overall hazard and cause-specific hazard, and the probability of failure type given the individual’s failure time. The estimates of the probability of failure type being cancer is displayed in Figure 1. It shows that cancer risk is higher than non-cancer death in mid-age but lower in young and old ages.

The non-parametric estimate of the probability of failure type being cancer given the age at onset (solid circle) and a fitted line of a fourth-degree polynomial model to this non-parametric estimate. The fitted model is used in the simulation study to **...**

Table 1 shows the estimation results for the association parameters. The number of pairs with both members having cancer(*d*_{11}), number of pairs with one member having cancer and the other member dead of non-cancer (*d*_{12}), and number of pairs with both members dead of non-cancer (*d*_{22}) are listed in the second column. The bootstrap standard errors were obtained from 500 bootstrap samples. In estimating the association parameters, to assure there are sufficient number of paired events in each sub-region to calculate the piecewise cross-ratios and odds-ratios, and to choose each region which is biologically meaningful, the number of knots was set at *K* = 3 with *w*_{1} = 50, *w*_{2} = 70, and *w*_{3} = ∞ (see Table 1). The cut-off values 50 and 70 divide the cohort into young (< 50 years old), mid-age (50−70) and old (> 70) subgroups. The data of 13,962 pairs from 4,152 families were used to estimate the cross-ratios. All the piecewise cross-ratios are close to 1, indicating the association of times to first failure between first-degree relatives is modest and almost time invariant.

Nine hundred and fifty pairs with failures observed in both pair members were used to estimate the odds-ratios. Of these 950 pairs, there were 767 distinct paired failure times. Because at each of these distinct paired failure times, most of the time there was only one observation, the non-parametric estimate of the probability of failure type given the paired failure time, ${p}_{j}^{k}({t}_{1},{t}_{2})$ is mostly 0 or 1. Thus, ϕ_{ij}s cannot be estimated reliably under this non-parametric estimation. Therefore, we considered the two alternative approaches for the estimation of ${p}_{j}^{k}({t}_{1},{t}_{2})$ described in the previous Section: assuming ${p}_{j}^{k}({t}_{1},{t}_{2})$ to be piecewise constant vs. the probability of failure type depends only on the individuals’ own failure time. The estimated odds-ratios in the six sub-regions as ordered in Table 1 for the two approaches are (4.25, 1.95, 0.86, 1.35, 1.63, 1.65) and (5.22, 1.20, 0.88, 1.57, 2.07, 1.97), respectively. Both approaches yielded similar estimates of the odds-ratios. The estimates based on the latter approach are shown in Table 1. The distributions of the bootstrap estimates of θ_{ij}s were close to normal, but those of ϕ_{ij}s were skewed. Hence the estimates of ϕ_{ij}s were log-transformed. Compared to the estimates of the cross-ratios, the odds-ratios for the association of failure types between the first-degree relatives vary in magnitude over different ranges of ages at onset. Of particular note is the large odds ratio for the ages at onset younger than 50. It implies that the probability of having cancer for a woman who had an event before age 50 is more than 5 times higher when her first-degree relative had cancer before age 50 than if her first-degree relative died of non-cancer before age 50. For women older than 70 years, there is a trend that her chance of developing cancer is doubled if her first-degree relative had developed breast cancer after age 50 than if her first-degree relative died of non-cancer after age 50 (ϕ_{22} = 2.07, ϕ_{23} = 1.97).

The estimates of cause-specific cross-ratios are displayed in Figure 2. The cross-ratio for cancer is high (> 2) when the ages at onset in both members are young and slightly elevated (1.5 − 2) when both members are old. However these elevations, likely due to few cancer cases in both pair member in these age ranges, are not statistically significant.

To see the impact of the failure time and failure type of one family member on the cumulative incidence of the other family member, we plot the conditional incidence function along with the unconditional counter part. The four plots in the left panel of Figure 3 display the marginal and conditional cumulative risk of a woman developing cancer, and the four plots in the right panel display the marginal and conditional cumulative risk of a woman dead of non-cancer. These plots show that a woman’s cumulative risk of cancer, compared to the marginal cumulative risk, is increased if her first-degree relative had cancer before age 70, and decreased if her first-degree relative had cancer after age 70. If the first-degree relative died of non-cancer before age 50, then the woman’s cumulative risk of cancer is still increased (top two plots in the left panel), although the magnitude of the increase is lower than if her first-degree had cancer. In contrast, if the first-degree relative died of non-cancer after age 50, the woman’s cumulative risk of cancer is decreased slightly, and her cumulative risk of non-cancer death is increased.

Left panel: marginal and conditional cumulative cancer incidence; right panel: marginal and conditional cumulative non-cancer mortality incidence.

One may be also interested in the conditional cumulative incidence given the failure type of the first-degree relative. In Figure 4, the top panel shows that a woman’s conditional cumulative cancer incidence increased if her first-degree relative had cancer, and decreased if her first-degree relative had died of non-cancer. The lifetime (up to age 100) cumulative incidence of cancer increased from 40% to 46%, if a woman’s first-degree relative had cancer. Such a pattern also holds for non-cancer mortality. The lower panel of Figure 4 shows that the conditional cumulative non-cancer increased if the first-degree relative had died of non-cancer, and decreased if the first-degree relative had cancer. The lifetime risk of non-cancer death increased from 58% to 61%, if a woman’s first-degree relative had died of non-cancer.

We used simulations to study the performance of the estimation procedure proposed in the previous section. Initially, we generated 5,000 pairs of (*T*_{1}, *T*_{2}) and (*Y*_{1}, *Y*_{2}) according to the proposed model described in Section 2. Specifically, we first generated (*T*_{1}, *T*_{2}) followed by (*Y*_{1}, *Y*_{2}) conditional on (*T*_{1}, *T*_{2}). We assumed the marginal distribution of *T*_{1} and *T*_{2} follows a Weibull model with the shape and scale parameters estimated from the WAS data. The number of knots and cutoff values used to define the piecewise cross-rations and odds-ratios were set at the same values as in the WAS data analysis (*w*_{1} = 50, *w*_{2} = 70, *w*_{3} = ∞). We took the constant value of 1.2 for all the cross-ratios, because it is close to the estimates seen in the WAS study. We generated the failure time *t*_{1} for the first member in each pair from the Weibull model and the bivariate survival function values on the bivariate knots on each grid in the sample space in the order laid out in the Appendix. Then we generated the event time for the second pair member, *t*_{2}, by solving

$$u=P({T}_{2}>{t}_{2}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{1}={t}_{1})={S}_{{A}_{{i}_{1}{i}_{2}}}{({t}_{1},{t}_{2})}^{\theta {i}_{1}{i}_{2}}{\displaystyle \prod _{k=2}^{{i}_{2}}{S}_{{A}_{{i}_{1}k}}{({t}_{1},{w}_{k-1})}^{-\{{\theta}_{{i}_{1}k}-{\theta}_{{i}_{1}(k-1)}\}}S{({t}_{1})}^{-{\theta}_{{i}_{1}1}},}$$

where *u* is a uniform deviate and *i*_{1}, *i*_{2} are the knots such that (*t*_{1}, *t*_{2}) *A*_{i1,i2}. We fitted a fourth-degree polynomial model to the probability of failure type being cancer (Figure 1) and used the fitted model to generate bivariate failure types. The fitted model is given by

$$P({Y}_{j}=1\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{T}_{j}=t)=0.11+1.37{t}^{*}+6.57{t}^{*2}+-17.48{t}^{*3}+9.43{t}^{*4},$$

where *t** = *t*/100. The failure type for the first member of each pair was generated from the bernoulli distribution with *p* = *P*(*Y*_{1} = 1 | *T*_{1} = *t*_{1}). The failure type for the second member was generated from the bernoulli distribution with *p* = *P*(*Y*_{2} = 1 | *T*_{1} = *t*_{1}, *T*_{2} = *t*_{2}, *Y*_{1} = *k*; ϕ_{i1i2}), where *k* is the failure type of the first member. Each individual is subject to independent censoring, where for the first member of each pair the censoring variable follows a normal distribution *N*(82, 14) corresponding to 40% censoring and *N*(60, 10) for the second member corresponding to 75% censoring. The censoring pattern was devised to mimic that of the (mother, sister) pairs in the WAS study. Randomly generated failure times were truncated to integer to represent ages at onset as recorded in the WAS study.

The simulations were repeated 1,000 times and the results were summarized in Table 2. It shows that most of the Monte-Carlo means are close to the true parameter values. One exception is ϕ_{13}, where _{13} is slightly over-estimated. Consistent with the observations seen in the example, the standard errors of odds-ratios overall are larger than those of cross-ratios. This is due to the fact that for the cross-ratios all paired data, whether censored or not, contribute to the estimation, but for the odds-ratios, only pairs with both failures observed enter the likelihood for estimation. In addition, the estimate of the marginal probability of failure type given the individual’s failure time is highly variable if the number of failures at that failure time is small. Therefore, in order to obtain a stable estimate of ϕ_{ij}, an adequate number of bivariate failures for each bivariate failure type in each grid region is essential. The coverage probability for most of the parameters are close to the 95% nominal level. Hence the parameter estimators are approximately normally distributed.

We performed additional simulations to compare the performance of our estimator of the cause-specific cross-ratio with that of Bandeen-Roche and Ning (2008). Their method was an extension of the non-parametric estimator of the Kendall’s tau ignoring competing risks (Oakes, 1982). In their approach, only data with failure observed in both pair members and with failure type of interest were included. For example, if one is interested in cancer-cancer cross-ratio, their method would eliminate all censored pairs, pairs with cancer vs. non-cancer death, and pairs with non-cancer death in both members.

We first generated the bivariate competing risk data using the same scenario as presented above. Under this scenario, the probability of event type is time-dependent, and the odds-ratios for the bivariate failure types are piecewise constant. As a result, the cause-specific cross-ratio according to (4) is time-varying. In our approach, the cause-specific cross-ratio in each grid element was averaged with the bivariate incidence function as its weight to represent the cause-specific cross-ration in that grid element, whereas the Bandeen-Roche and Ning approach used the probability of discordance as the weight. The simulation results are presented in Table 3. The cause-specific cross-ratios for bivariate failure type (1, 1) was underestimated by the Bandeen-Roche and Ning approach except in the last grid element [70, ∞) × [70, ∞) where the estimate was overestimated. However, as one referee pointed out, since the two approaches use different weights and estimate different quantities under the time-vary cause-specific cross-ratio model, the two estimates are not directly comparable. Nevertheless, the estimate from our approach had substantially lower standard deviation.

Second, we generated bivariate failure time data with constant cause-specific cross-ratio. Specifically, bivariate failure times to first event were generated according to the gamma frailty model with same parameter values as our original simulation study presented in the manuscript (i.e. Weibull marginal survival and constant cross-ratio θ = 1.2). The bivariate failure types were generated from the beta-bernoulli distribution as described in Bandeen-Roche and Liang (2002) with scale parameter 1 and the probability of failure type being 1 equal to 0.2. Under this scenario, the cause-specific cross-ratio for bivariate failure types (1, 1) equals 3.6. In the case of no censoring, the sample size was set at *n* = 1000. With censoring, censoring time was generated from normal with mean 65 and standard deviation 10. Approximately 25% of the failure times were censored and the sample size was set at *n* = 2000. Since the probability of failure type being 1 does not change with time, the estimate of the probability of failure type given the failure time from the simulated data fluctuates around the true value. Hence in our estimation procedure for the odds ratio, the marginal probability of failure type was assumed constant. We mimicked the simulation study of Bandeen-Roche and Ning (2008) in setting the bins by bisecting each time dimension at its median in each simulation run. Under this scenario, the averaged cause-specific cross-ratio in each grid is constant regardless of the weight used in averaging and equals 3.6. The simulation results are summarized in Table 4. It shows that when there is no censoring, the estimate of the cross-ratio for bivariate failure types (1, 1) has little bias for both approaches. Consistent with the findings in the second simulation study above, the variance of the Bandeen-Roche and Ning’ approach is considerably larger than that of our approach. Such an inflation in the variance is due to the fact that our approach uses all data in the estimation, while the Bandeen-Roche and Ning’s appraoch only uses the pairs with bivariate failure types (1,1), which on average consists of only 12% of the total number of pairs. With bivariate failure types (2,2) which consists of about 75% of the data, the variability of the estimate of the Bandeen-Roche and Ning approach is still larger but closer to our approach (results not shown). In the presence of censoring, The Bandeen-Roche and Ning’s appraoch is more biased and the variance of their estimate is more inflated than for the uncensored case.

In summary, these simulation studies indicated that, in general, our proposed approach has less bias and is more efficient than Bandeen-Roche and Ning’s approach.

In this paper, we have proposed a flexible model for bivariate competing risk data. In our approach, we decompose the bivariate competing risk data into bivariate times to first failure and bivariate failure types given the failure times. To each bivariate component, we apply a piecewise-constant association model to measure the dependency. We have shown that with a given marginal distribution of time to first failure and piecewise constant cross-ratios, the bivariate distribution is determined. With this approach, we are able to assess the contribution of each component of the bivariate data to the cause-specific association. In practice, the knots and the number of knots used to specify the piecewise-constant cross-ratios and odds-ratios are primarily determined by the data such that sufficient number of paired failures in each grid elements are observed. Such an approach is often necessary in the analysis of rare events such as cancer.

We have proposed a semi-parametric two-stage procedure for estimation. For the estimation of cross-ratios θ_{ij}s, at the first stage, the marginal distribution of the first failure time is estimated non-parametrically by the Nelson estimate. In the second stage, these cross-ratios are estimated sequentially in each coordinate such that the estimation of each cross-ratio depends on the estimates of the cross-ratio and in turn the bivariate survival function at times preceding it. For the estimation of odds-ratios ϕ_{ij}s, at the first stage, the probability of event type given the bivariate failure times is estimated non-parametrically. In the WAS study, the integer age was recorded and thus the probability of failure type was estimated at each observed age at onset. In the case of continuous time, the failure time needs to be binned in order to reliably estimate the probability of failure type. At the second stage, the odds-ratio in each region is estimated assuming the event type probability is known and using the paired data with failures observed in that regions for both pair members. Since the number of observations used in estimating each odds-ratio is smaller than that used in estimating the cross-ratio, the variation of the odds-ratio estimates tends to be larger than that of the cross-ratio. This is seen in the analysis of the WAS data as well as in simulated data. Therefore, in order to be able to assess the effect of the event type of one member on that of the other family member, it is necessary to observe sufficient number of failures for each bivariate failure type. The number of bivariate failures required to have a reliable estimate for each odds-ratio depends on the probability of failure type in each sub-region. If the probability does not change with time, then the rule of thumb for the estimation of odds-ratio in a 2 × 2 table applies here. In this case, a minimal number of 30 pairs would usually be sufficient. However, if the failure type probability changes with time, as the case for the WAS study shown in Figure 1, a larger sample size is required. This is because the non-parametric estimate of the failure type probability is more variable and hence the estimate of the odds-ratio is more variable as well.

Because of the complex nested structure of the piecewise-constant bivariate survival model for the times to first failure and sequential dependency of the estimation of cross-ratios, derivation of the theoretical properties of the proposed estimates is complex and a topic for future research. We used the bootstrap to estimate the standard errors and to draw inference. For the WAS study data, it took about one minute of CPU to estimate the parameters. With 500 bootstrap samples, it took about 8 hours to compute the standard errors. We used simulations to study the properties of the proposed parameter estimates. The simulation result shows that there is little bias for the parameter estimates, and the estimators are close to be normally distributed.

There are several interesting findings from our analysis of the WAS data. In this cohort, the majority of the cancers among women are breast and ovarian cancers, and the probability of having cancer is higher than non-cancer death when the age at onset is between 30 to 70 years old. An individual’s failure type also affects her first-degree relative’s failure type especially at young ages. Based on _{11} = 5.22, the probability of having cancer for a woman who had an event before age 50 is more than 5 times higher when her first-degree relative had cancer before age 50 than if her first-degree relative died of non-cancer before age 50. However, such a large increase in the conditional probability of failure type being cancer before age 50 does not result in the same degree of increase in the cumulative risk of cancer. For example, a woman’s cumulative risk of having cancer at age 30–50 increases 1.4–1.9 times if her first-degree relative had cancer between age 30 and 50 then if her first-degree relative died of non-cancer between age 30 and 50. Overall, a woman’s cumulative risk at all ages is increased if her failure type is the same as her first-degree relative’s. This translates into 6% and 3% increase of lifetime risk of cancer and non-cancer mortality, respectively.

The authors wish to thank Dr. Nilanjan Chatterjee for kindly providing the WAS study data used in this paper. The authors thank the associate editor and referees for their thorough review and insightful comments.

We arrange the sample space of *T*_{1} and *T*_{2} into a grid with elements *A _{ij}*,

A diagram used to illustrate the order of constructing the bivariate survival function on the boundaries.

We show that the bivariate survival function (2) is completely determined by the two marginal survival functions *S*_{1}, *S*_{2} and the piecewise constant cross-ratios θ_{ij}’s. Notice that if the survival functions on the bottom and left boundaries in each region *A _{ij}* are known, the joint survival function in that region is determined according to (2). Hence it is sufficient to show that the survival functions on the bottom and left boundaries for each

- Bandeen-Roche K, Liang KY. Modelling multivariate failure time associations in the presence of a competing risk. Biometrika. 2002;89:299–314.
- Bandeen-Roche K, Ning J. Nonparametric estimation of bivariate failure time associations in the presence of a competing risk. Biometrika. 2008;95:221–232. [PMC free article] [PubMed]
- Chatterjee N, Hartge P, Wacholder S. Adjustment for competing risk in kin-cohort estimation. Genetic Epidemiology. 2003;25:303–313. [PubMed]
- Chatterjee N, Kalaylioglu Z, Shih JH, Gail M. Casecontrol and case-only designs with genotype and family history data: Estimating relative risk, residual familial aggregation, and cumulative risk. Biometrics. 2006;62:36–48. [PubMed]
- Cheng Yu, Fine Jason P, Kosorok Michael R. Nonparametric Association Analysis of Bivariate Competing-Risks Data Journal of the American Statistical Association. 2007;102:1407–1415.
- Cheng Y, Fine JP. Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data. Biometrika. 2008;95:233–240.
- Cheng Y, Fine JP, Kosorok MR. Nonparametric association analysis of exchangeable clustered competing risks data. Biometrics. 2009;Vol. 65:385–393. [PubMed]
- Clayton DG. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika. 1978;65:141–151.
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993.
- Hougaard P, Harvald B, Holm NV. Measuring the similarities between the lifetimes of adult Danish twins born between 1881–1930. Journal of the American Statistical Association. 1992;87:17–24.
- Hougaard P. Analysis of multivariate survival data. Springer-Verlag Inc; 2001.
- Li H, Yang P, Schwartz AG. Analysis of age of onset data from case-control family studies. Biometrics. 1998;54:1030–1039. [PubMed]
- Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. John Wiley & Sons; 2002.
- Nan B, Lin X, Lisabeth LD, Harlow SD. Piecewise constant cross-ratio estimation for association of age at a marker event and age at menopause. Journal of the American Statistical Association. 2006;101:65–77.
- Oakes D. A model for association in bivariate survival data. Journal of Royal Statistical Society, B. 1982;44:414–422.
- Oakes D. Bivariate survival models induced by frailties. Journal of the American Statistical Association. 1989;84:487–493.
- Prentice RL, Kalbfleisch JD, Peterson AV, Jr, Flournoy N, Farewell VT, Breslow NE. The analysis of failure times in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed]
- Shih JH, Louis TA. Inferences on the association parameter in copula models for bivariate survival data. Biometrics. 1995;51:1384–1399. [PubMed]
- Struewing JP, Hartge P, Wacholder S, Baker SM, Berlin M, McAdams M, Timmerman MM, Lawrence BC, Tucker MA. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. The New England Journal of Medicine. 1997;336:1401–1408. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |