PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Scand Stat Theory Appl. Author manuscript; available in PMC Sep 1, 2012.
Published in final edited form as:
Scand Stat Theory Appl. Sep 2011; 38(3): 409–423.
doi:  10.1111/j.1467-9469.2010.00720.x
PMCID: PMC3172162
NIHMSID: NIHMS244415
The additive risk model for estimation of effect of haplotype match in BMT studies
Thomas H. Scheike
Department of Biostatistics, University of Copenhagen
Torben Martinussen
Department of Biostatistics, University of Southern Denmark
Mei-Jie Zhang
Division of Biostatistics, Medical College of Wisconsin
Thomas H. Scheike, Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5B, P.O.B. 2099, DK-1014 Copenhagen K, Denmark ; ts/at/biostat.ku.dk
In this paper we consider a problem from bone marrow transplant (BMT) studies where there is interest on assessing the effect of haplotype match for donor and patient on the overall survival. The BMT study we consider is based on donors and patients that are genotype matched, and this therefore leads to a missing data problem. We show how Aalen's additive risk model can be applied in this setting with the benefit that the time-varying haplo-match effect can be easily studied. This problem has not been considered before, and the standard approach where one would use the EM-algorithm cannot be applied for this model because the likelihood is hard to evaluate without additional assumptions. We suggest an approach based on multivariate estimating equations that are solved using a recursive structure. This approach leads to an estimator where the large sample properties can be developed using product-integration theory. Small sample properties are investigated using simulations in a setting that mimics the motivating haplo-match problem.
Keywords: Aalen model, additive risk model, estimating equation, flexible modeling, haplo-match effects, haplo-type effects, missing data, semiparametric modeling, survival data
Hematopoietic cell transplantation (HCT) is a life saving procedure for many cancer patients, but it also has side effects, especially for transplants from unrelated donors, for example when the donor is not from the patient's family. Graft-versus-host disease (GVHD) complications is one of the major causes for treatment related death. To reduce GVHD, human leukocyte antigen (HLA) fully matched graft needs to be selected. When a patient does not have an HLA-identical sibling donor, then an HLA fully matched unrelated transplant often will be considered. For an HLA fully matched unrelated transplant, the donor is selected such that the patient and donor have identical HLA genotypes, but they may not necessarily have matching haplotype pairs. For example, suppose the donor and patient have identical HLA-A/B/DRB1 alleles of {(A1, A3), (B8, B7), (DR3, DR2)}. Now, there are four potential haplotypes that are consistent with this genotype, namely {(A1, B8, DR3), (A3, B7, DR2)}, {(A3, B8, DR3), (A1, B7, DR2)}, {(A1, B8, DR2), (A3, B7, DR3)}, and {(A3, B8, DR2), (A1, B7, DR3)}. Here, a donor with {(A1, B8, DR3), (A3, B7, DR2)} and a patient with {(A1, B8, DR3), (A3, B7, DR2)} are haplotype matched, but a donor with {(A3, B8, DR3), (A1, B7, DR2)} and a patient with {(A1, B8, DR3), (A3, B7, DR2)} are haplotype miss-matched, see Petersdorf et al. (2007) for details. Among HLA-genotype matched transplants with unrelated donors it has not been fully investigated whether there is an effect of haplotype matching. This is an important question since if a positive beneficial haplotype matching effect can be identified, then patient and physician need to search for a haplotype matched donor among those available HLA-genotype fully matched potential donors. Among a patient and donor with identical HLA genotype, the typical probability of a HLA haplotype match is around 70% (Petersdorf et al., 2007).
Recently, Petersdorf et al. (2007) studied the effect of major histocompatibility complex (MHC) haplotype match of donor and patient after HLA-identical unrelated HCT for leukemia. Their analysis was based on observing the haplotype pairs directly and therefore was carried out as a Cox regression analysis. Identifying the haplotype pairs is a very costly, time-consuming and not entirely unproblematic procedure. In this study, we show how modeling of the missing haplotypes for donor and patient can also be used to address the issue using only the observed genotype and failure time data. For illustration purpose, we consider a similar cohort of patients with HLA-identical unrelated HCT for acute leukemia. The data was selected from the Center for International Blood and Marrow Transplant Research (CIBMTR). We found a strong time-varying effect for the haplotype matching, and therefore the use of Cox's proportional hazards model is not appropriate to assess the effect of haplo-type match for this transplant data. We instead consider Aalen's additive intensity model (Aalen, 1989). This model allows the haplotype matching effect to be varying over time and the time-dependent excess risk can be estimated directly. This excess risk translates directly into a simple relative survival percentage. One particular advantage of this approach is that the absolute size of the effect is directly accessible and that the model does not rely on any particular structure for the effect.
Missing data for survival analysis has been considered in detail in many different settings for Cox's regression model. For example with missing partial covariates for some patients and in frailty models for unobserved cluster effect. Estimating haplotype effects using Cox's regression model has been considered in a large number of papers (Schaid et al., 2002; Zaykin et al., 2002; Schaid, 2004; Zhao et al., 2003; Tregouet & Tiret, 2004; Lin & Zeng, 2006; Epstein & Satten, 2003; Chen et al., 2004; Scheike et al., 2010). One of the attractions for fitting a Cox's regression model is that the EM-algorithm can be applied in this setting. This makes it easy to deal with the missing data. Alternative estimating equations have been considered in Zucker (2005); Gorfine et al. (2005); Scheike (2006); Scheike et al. (2010).
There has been no related statistical work on how to fit an additive intensity model in this setting where no simple likelihood is available. In this paper, we suggest a new approach that fits the additive risk model without smoothing or additional assumptions. The proposed method is simple to implement and use. The basic idea is to solve the estimating equations recursively to deal with the structure of the model appropriately. It turns out that the large sample properties of the proposed estimators can be derived based on product integration theory (Gill & Johansen, 1990).
The model and estimating procedures are presented in section 2. Section 3 gives large sample properties and also a test for investigating time-invariance. Small sample performs are assessed by simulation studies that are presented in section 4. In section 5 we consider a worked example that assesses the effect of haplotype matching for HCT patients. Section 6 concludes with some final remarks. Proofs are collected in the Appendix.
Let equation M1 be the failure time of interest and let C be a potential censoring time such that we observe equation M2 and = equation M3. Let N(t) = I(Tt, δ = 1). We assume that, given covariates X and observed haplotype pairs for the donor Hd = (Hd1, Hd2) and the patient Hp = (Hp1, Hp2), the counting process N(t) has an additive intensity
equation M4
(1)
where R(t) = I(Tt) indicates individual is at risk at time t, and X(Hd, Hp) is a function of the observed covariates and haplotypes of donors and patients. In this paper, we consider the design where X(Hd, Hp)T = {XT, I(Hd = Hp)}. This simple design gives the effect of haplotype matching between donor and patient.
If Hd and Hp are known, a simple standard survival analysis based on the additive model can be used. However, the problem here is that we only observe the underlying HLA genotypes related to Hd and Hp. For HLA-identical unrelated transplant data, the patient and donor have identical genotypes, but may not necessarily have matched haplotype pairs (see section 1).
To estimate the effect of haplo-match we need to model how the haplotype pairs relate to the observed genotypes. We assume that the probability of a specific haplotype pair for both donor and patient H = (hi, hj) can be computed under Hardy-Weinberg equilibrium as
equation M5
(2)
where πi, i = 1, .., K gives the distribution of the considered haplotypes. One may consider any relevant parametric model for the frequencies of the haplotype pairs. Explicitly, we parameterize the haplotype frequencies as
equation M6
with ηj [set membership] ] − ∞, ∞[ and with the identifiability restriction ηK = 0 (Σ πi = 1). We denote the haplotype parameters as = ν = (η1,…, ηK−1). For the HCT data, these parameters are unknown and must be estimated from the data, but one may also search known data-bases to get estimates of these parameters. We here treat the haplotype parameters as unknown. In the HCT data, we have a very rich haplotype space. Thus, a very large number of parameters need to be estimated and since we only have limited observed data, we will make further simplifying assumptions about the haplotype frequencies (see section 5).
Based on this model we can infer the conditional distribution of the underlying haplotype pairs given the observed genotype, P (H = h|G = g). The observed intensity given genotype can be written as
equation M7
where equation M8 and Σhd,hp[set membership]G is the sum over all haplotype pairs that are consistent with the observed genotype. For unrelated donors and patients, we can assume that P(Hd = hd, Hp = hp|G) = P(Hd = hd|G) × P(Hp = hp|G). For related donors and patients this independence assumption does not hold. Here, the observed hazard function has an additive structure and it also depends on the cumulated regression function evaluated at time t, i.e. just before time t.
The likelihood for a subject is
equation M9
and the score equation for ν can be partitioned into two parts
equation M10
where UG is the part of the score that relates to Pν(G) and UN relates to the information contained in the survival data.
For the nonparametric additive structure of the hazard function, the likelihood is hard to work with even when haplotype pairs, H, are fully observed. Therefore, we consider an alternative approach for estimating the nonparametric regression functions. We assume that n i.i.d. observations are available in the time-interval [0, τ], for some τ < ∞. Let N(t) = {N1, ..,Nn(t)}T be the n-dimensional counting process of all subjects with corresponding intensity λ(t) = {λ1(t), …, λn(t)}T.
We first note that estimation can be done recursively. Assuming for the moment that ν is known and equation M11 for all t smaller than the first jump-time. The intensity of Ni(t) can be written as equation M12, where equation M13. This leads to a least square estimator
equation M14
where the ith row of equation M15 and equation M16 is the generalized weighted inverse of equation M17, which is given by
equation M18
Here the diagonal elements of the weight function, W are N (t, ν,Gi, Xi)/T(t, ν, Gi, Xi). The efficient weight is equivalent to the efficient weight in the one-dimensional case, which involves inverse of X(hd, hp)T α(t), and it can be implemented using a two-stage approach as for the additive risk model (Huffer & McKeague, 1991).
Next, we look at an estimating equation for ν based on the survival data. We first need some definitions. Let the ith row of equation M19 where
equation M20
is the derivative of the intensity with respect to ν multiplied by the weight described above. Here Dνh(x) denotes the derivative with respect to ν evaluated at x.
This leads to an estimating equation for the unknown ν based on the survival data
equation M21
We now replace A(t) with Â(t) and add the score contribution coming from P (G) to obtain the following estimating equation for ν:
equation M22
(3)
Then, ν is estimated by solving U(ν, Â) = 0 and denoted by equation M23. Finally, we then estimate A(t) by least square estimator with equation M24.
One may ignore the information about ν contained in the survival data and base estimation solely on the genotype data. This we refer to as the two-stage approach.
In this Section we outline the large sample properties and derive estimators for the variance of the suggested estimators. When the estimating function, U, is evaluated in the true parameter values ν0 and A it becomes a martingale, but the asymptotic analysis is complicated by the fact that A is estimated using a recursively constructed estimator.
With equation M25 the proper root of (3) and with additional regularity conditions it follows that it is consistent and asymptotically normal, we refer to the Appendix for further details. But the key arguments are based on the one dimensional case as in Bagdonavicius & Nikulin (1999); Chen et al. (2002); Dabrowska (2005); Gorfine et al. (2005); Zucker (2005). The key to the extension to the multivariate additive risk model is based on the powerful and elegant product integration theory given by Gill & Johansen (1990). This also is utilized by Dabrowska (2005).
It can be established that
equation M26
where Ui(t, ν0) is defined in the second equation, equation M27 is the basic martingale and qi(s, ν) is a predictable function given in (9) in the Appendix, and equation M28 the score for subject i corresponding to the genotype log-likelihood. Based on this i.i.d. decomposition a simple robust estimator of the variance of n−1/2U(t, ν0) is given as
equation M29
(4)
where equation M30 and equation M31 and equation M32 are estimators of qi and Mi obtained by using the estimates and empirical versions of all expressions.
Based on this it can be derived that equation M33 converges to a zero-mean normal distribution with covariance matrix that can be consistently estimated by
equation M34
where equation M35 denotes minus of the derivative of U with respect to ν.
It also follows that equation M36 converges to a Gaussian process. The key is again to establish an i.i.d. decomposition. It turns out, based on a Taylor series expansion, that (up to an op(1) term)
equation M37
where the Li's are defined in (10) in the Appendix and they are i.i.d. processes.
Let equation M38 denote the estimator of Li by plugging in the estimates of the unknown quantities, see Appendix for details. The variance of n1/2{Â(t) – A(t)} may therefore be estimated by
equation M39
(5)
The asymptotic distribution can be approximated by a resampling scheme
equation M40
(6)
where G1, …,Gn are independent standard normals and are independent of the counting processes and their covariates.
By a resampling technique we can construct (1 – α)% confidence bands for each cumulative regression coefficient function of Aj(t), for t [set membership] [0, τ], and derive a supremum test for testing the hypothesis of H0 : αj(t) = 0, for t [set membership] [0, τ]. We omit the details here.
3.1 Testing of constant effects and significant effects
To derive a formal test to see if a given effect varies over time or if it leads to constant excess risk, we consider a supremum test that we evaluate based on the resampling technique. Considering for example the jth component of the model, αj(t), we now wish to test H0 : αj(t) [equivalent] γj for t [set membership] [0, τ]. As in Martinussen & Scheike (2006) a test can be based on a supremum test statistic
equation M41
Let equation M42 where equation M43 is the kth realization of the resampling process of equation M44. Based on derived large sample property and under H0, equation M45 can be asymptotically approximated by
equation M46
for κ = 1, …, K. A formal test can be derived by comparing observed equation M47 and simulated equation M48
We simulated data similar to the BMT example data considered in section 5. The underlying additive hazard model had a baseline at 0.3 with the effect of haplotype match being −0.2 in the first 1.5 time units and 0 afterwards. The survival times were right censored at 3 time units. In the 2054 patients we consider below we found that there where 20, 31 and 13 different allele types for the loci A, B, and DRB1, and that this lead to a total of 2,325 possible haplotypes. These haplotypes where asssumed to have frequencies similar to those we found in a structured MLE based on the genotype data where we found that 1901 of the haplotypes had frequency 9.95 * 10−5, 214 had frequency 0.00072, 192 had frequency 0.0019, 14 had frequency 0.0087, 2 had frequency 0.027, 1 had frequency 0.042, and 1 had frequency 0.083, respectively. We tried various other haplotype distributions that were not based directly on the underlying structured model, and we will study this issue further in future work. We then considered sample sizes of 2400, 1200, 600 and 300 that lead to approximately 1100, 650, 320 and 160 number of deaths on average, respectively. 2000 random samples were generated for each simulation study, and the resampling based tests were based on 1000 resamples.
We present detailed results for sample size of 600. Figure 1(a) and 1(b) show the mean of the estimated cumulative intensity (broken line) and the true cumulative coefficient (solid line) for the baseline and the effect of the haplotype match, respectively. Figure 1(c) and 1(d) give the mean of the estimated standard errors (broken line) and the sample standard deviation of the realizations (solid line) for baseline and effect of the haplotype match, respectively. Figure 1 shows that the estimators are essentially unbiased and the standard errors are well estimated by the derived estimator. There is some slight underestimation of the variance, that diminishes when the sample sizes increase (figures are not shown here).
Figure 1
Figure 1
Mean of estimated cumulative hazards (broken line) versus true cumulative hazards (solid line) for the baseline and haplomatch effect (Panels a and b), and observed pointwise standard deviation of estimates (solid line) versus mean of estimated standard (more ...)
The standard errors of the cumulative coefficients are approximately halved if the haplotype match had been observed. We also considered the effect of estimating the haplotype parameters and this only had minor impact on the standard errors, estimates and coverage probabilities.
Figure 2 shows the pointwise coverage probabilities for the baseline intensity and for the effect of haplotype match, respectively. The slight bias in the variances result in coverage probabilities that are slightly lower for most of the time-axis. The initial problems with the coverage probabilities are well known and due restrictions close to 0, these problems may be remedied by choosing an appropriate transformation.
Figure 2
Figure 2
Observed pointwise coverage of 95 % pointwise confidence interval for sample size 500 for the cumulative baseline (solid line) and cumulative excess effect of haplomatch (broken line).
We also considered the test for constant effects developed earlier as well as the coverage of the confidence bands. The observed coverage is considered in the time-period from 0.3 to 3 to disregard the initial problematic time period. This resulted in the observed levels reported in Table 1.
Table 1
Table 1
Observed power for test of constant effects and coverage probabilities for confidence bands for model where baseline is constant and haplotype match effect is non-constant.
The test for constant effect achieves the nominal 5% power for the constant effects for the baseline and the power for the time-varying effect of the haplotype match increases with increasing sample size. The coverage probabilities of confidence bands are in an acceptable level. Slightly to low for the baseline, but close to the true 95% for the haplo-match effect.
The finite sample properties are well described by the estimators. Both the pointwise and uniform description of the uncertainty works well. The slight problems are similar to those found for the additive intensity model with known covariates.
Recently, Petersdorf et al. (2007) studied the effect of MHC haplotype match based on 246 leukemia patients. All these 246 patients received a HLA-identical unrelated HCT between 1986 and 2003. Their analysis was based on observing the haplotypes directly. This is a very costly and time-consuming procedure. They identified 191 (78%) transplants that were haplotype-matched. They demonstrated that the MHC haplotype match of donor and patient had significant impact on the transplantation outcome of developing acute graft-versus-host disease (GVHD) and recurrent malignancy, but had no impact on the treatment related mortality (TRM) which is defined as death in complete remission and overall mortality. For the illustration purpose, we consider a similar cohort of patients with HLA-identical unrelated HCT for leukemia and transplanted between 1995 and 2007, and donor-patient matched on four loci HLA-A/B/C/DRB1 based on low-resolution (2 digits) classification. The data presented here were obtained from the Statistical Center of the Center for International Blood and Marrow Transplant Research (CIBMTR) located at the Medical College of Wisconsin. The analysis has not been reviewed or approved by the CIBMTR. The CIBMTR is comprised of clinical and basic scientists who share data on their transplant patients with CIBMTR Statistical Center.
The selected data consists of 5,145 leukemia patients (2,054 for acute myeloid leukemia (AML), 1,171 for acute lymphoblastic leukemia (ALL), 1,037 for chronic myelogenous leukemia (CML) and 883 for myelodysplastic syndrome (MDS)). The analysis showed that a haplotype match only had a significant effect on AML patients. For illustration purposes, we here focus on the overall survival of the 2,054 AML patients.
The genotype data used in estimation is based on the three-loci HLA-A, HLA-B and HLA-DRB1. This gives a very rich haplotype space. To reduce the total number of possible haplotypes, low-resolution (2 digits) classification of alleles at loci {A1, A2}, {B1, B2} and {DRB11, DRB12} were used to estimated haplotype frequencies. There were 20, 31 and 13 different allele types for the loci A, B, and DRB1, respectively. This leads to a total of 2,325 possible haplotypes. Some additional structure is needed to be able to estimate the parameters in this setting, and we grouped rare haplotypes into groups with a common haplotype frequency. We ended up with a model with 10 parameters where the 5 largest frequencies have their own parameters and other frequencies where allowed to be in 5 different groups. We tried modifying the models in many different ways but the results were quite robust to these choices. In the regression analysis below we adjust for disease stage at pre-transplant (high risk patient (36%) versus early or intermediate risk patient (64%)), age of patient (> 45 (50%) versus 19 – 45 (38%) versus ≤ 18 (12%) years), GVHD prophylaxis (Cyclosporine-A (CsA) ± other (44%) versus others (56%)), graft source (bone marrow (BM) (45%) vs peripheral blood (PB) (55%)) and year of transplant (1995–2001: early, (25%) versus 2002–2007 (75%)). We considered the survival for the first 90 months (τ = 90). There were 1,206 patients died within this study period.
We fitted an additive model to evaluate the effect of haplo-match by XT(HP, HD) = {XT, I(HP = HD)}, where X consists of a baseline, disease stage, age of patient, GVHD prophylaxis, graft source and transplant year. The estimated cumulative coefficients and 95% pointwise confidence intervals (solid lines) and with 95% confidence bands (broken lines) for covariates and haplo-match are given in Figure 3. Note that it appears that all covariates had time-varying effects and that the haplo-match effect was also timevarying.
Figure 3
Figure 3
Cumulative excess hazard for covariates based on additive model with 95% point-wise confidence intervals and 95% confidence bands (broken lines).
The formal tests for a constant excess risk lead to p-values for disease stage, age group 19 – 45, age group > 45, GVHD prophylaxis, graft source and year of transplant that are <0.001, 0.051, <0.001, 0.002, 0.014 and 0.002, respectively. The haplo-match effect lead to a p-value of p = 0.033. Figure 3 and these tests results indicate that a similar Cox regression analyses would not be appropriate here.
Figure 3 indicates the overall effect of matching haplotypes is significant even though the significance test only leads to a p-value of 0.055. Note, also that the test for constant effect also provides an alternative test for significance.
Also, the negative cumulative hazards for the haplo-matching indicates that patient with a haplo-matched HCT has a higher survival probability. Figure 4 gives the predicted survival probabilities of haplo-mismatch (panel a) and haplo-matched (panel b) for a young patient (≤ 18) with early or intermediate disease at pre-transplant, received other than CsA ± others for GVHD prophylaxis and received a PB transplant between 2002–2007. The survival predictions are provided with 95% pointwise confidence intervals (dotted lines) and with 95% confidence bands (broken lines). The predicted 5 year survival probabilities for haplo-mismatched and haplo-matched HCT are 55% (95% CI: 38%–72%) and 83% (64%–100%), respectively.
Figure 4
Figure 4
Predicted survival probabilities with 95% pointwise confidence intervals (dotted lines) and 95% confidence bands (broken lines) based on additive risk model for a specific patient (see text) with haplotype mismatched HCT (Panel a) and with haplotype matched (more ...)
This BMT example showed a positive beneficial effect in overall survival for haplotype matched transplant. As we discussed in section 1, this positive effect could lead to a new decision in choosing a best matched donor.
The popularity of the additive hazards model risk model has grown as it is well suited to study time-dynamics, that is whether or not effects of covariates change with time. In this paper we have demonstrated how the additive risk model can be applied also when there is missing data, and in particular how one can estimate effects of haplo-type match in BMT studies.
The large sample properties of the suggested estimators were derived using product integration theory, and these seem to give an accurate description of the variability even for very small samples. Furthermore, the formal test of whether or not covariate effects vary with time worked quite well in the simulations.
In the context of the the BMT example, we studied the effect of haplotype match, and here specific modeling of the haplotype frequencies were needed. It would be interesting to further consider how to make effects robust to misspecification of the haplotype modeling, and one may pursue ideas like Allen et al. (2005). Further, one may model the haplotypes with additional complexity to allow for lacking Hardy-Weinberg Equilibrium and recombination that is know to present on the HLA gene for the three loci we consider.
The approach is implemented in the R-package HaploSurvival, that is being developed by the first author.
We appreciate helpful and constructive suggestions by the Editors and a referee.
This research was supported by NIH-grant 2 R01 CA54706-10, and a grant from the Danish Research Council on “point process modeling and statistical inference”.
7 Appendix
We start by making the observation that (up to op(n−1))
equation M49
where M(s) is the (vector) counting process martingale, and where the time arguments were omitted for notational simplicity. Let equation M50 and let the limit of this quantity be denoted as s0(t). Note that the first term, Vn, by the martingale CLT, can be shown to converge to a Gaussian martingale process V(t) with variance equation M51. Note also that Vn(t) can be written as
equation M52
Also, a Taylor expansion using matrix derivatives yields (up to op(n−1/2))
equation M53
where the ith row of equation M54 is DAjfi(ν, A(t)), and therefore with
equation M55
This implies that equation M56 converges in distribution to a p-dimensional process W that satisfies the integral equation
equation M57
Denote the elements of ej(t) as ej,km(t). With the p × p matrices F(t) with elements equation M58 the solution can be written as
equation M59
Using the product integration formulae (Gill & Johansen, 1990) leads to the solution
equation M60
(7)
and where F(s, t) is the product integral of F. In the one-dimensional (commutative) case this equals exp equation M61. Note that the matrix F(s, t) is estimated consistently by the product of (the atoms)
equation M62
(8)
where equation M63. Using the product integral rather than the exponential in the one-dimensional case yields an estimator with the same asymptotic properties.
We now consider the estimating function for ν based on the counting process data, equation M64. By the martingale decomposition
equation M65
By a Taylor expansion, as before, it may be shown that the difference between the last two terms in the previous equation (up to op(n−1/2)) equals
equation M66
With the matrices in each of the terms converging to equation M67.
The last two term therefore equals (op(n−1/2))
equation M68
that can be written as
equation M69
with G(t) is an q × p matrix with elements equation M70. Now changing the order of integration and using the structure of W (t) one gets
equation M71
where
equation M72
Define
equation M73
(9)
Then one can write the estimating function as follows (up to an op(1) term)
equation M74
This is a sum of i.i.d. terms (or a martingale) and therefore converges to a normal distribution with variance that is estimated by the robust estimator given in (4). The total score for ν is UN0) + UG0)
Based on a Taylor series expansion (up to an op(1) term) we find that
equation M75
where
equation M76
(10)
and with P(t, ν0) the limit of equation M77 defined in (11), with I(τ) the limit of n−1I(τ, ν0) (12), and with the product integral F(s, t) defined in (7).
The derivative ofÃ(t) with respect to ν is
equation M78
(11)
The derivative with respect to ν of the estimating function, U, is
equation M79
(12)
Based on the derivation of the estimating function it straightforward to do a Newton-Raphson iteration to find a solution to the estimating equations. Note that the first term is a martingale when evaluated in the true parameter, and therefore will converge towards 0 in probability.
Note also that an estimator of F(s, t) is given in (8).
  • Aalen OO. A linear regression model for the analysis of life times. Statist. Med. 1989;8:907–925. [PubMed]
  • Allen AS, Satten GA, Tsiatis AA. Locally-efficient robust estimation of haplotype-disease association in family-based studies. Biometrika. 2005;92:559–571.
  • Bagdonavicius V, Nikulin M. Generalised proportional hazards model based on modified parital likelihood. Lifetime Data Anal. 1999;5:329–350. [PubMed]
  • Chen J, Peters U, Foster C, Chatterjee N. A haplotype-based test of association using data from cohort and nested case-control epidemiologic studies. Human Heredity. 2004;58:18–29. [PubMed]
  • Chen K, Jin Z, Ying Z. Semiparametric analysis of transformation models with censored data. Biometrika. 2002;89:659–668.
  • Dabrowska DM. Quantile regession in transfomation models. arXiv:math.ST. 2005;05115082v1:1–34.
  • Epstein MP, Satten GA. Inference on haplotype effects in case-control studies using unphased genotype data. American Journal of Human Genetics. 2003;73:1316–1329. [PubMed]
  • Gill RD, Johansen S. A survey of product-integration with a view towards application in survival analysis. Ann. Statist. 1990;18:1501–1555.
  • Gorfine M, Zucker D, Hsu L. Prospective survival analysis with a general semiparametric shared frailty model: A pseudo full likelihood approach. Biometrika. 2005;93:735–741. [PMC free article] [PubMed]
  • Huffer FW, McKeague IW. Weighted least squares estimation for Aalen's additive risk model. J. Amer. Statist. Assoc. 1991;86:114–129.
  • Lin D, Zeng D. Likelihood based inference on haplotype effects in genetic association studies. J. Amer. Statist. Assoc. 2006;101:89–110.
  • Martinussen T, Scheike T. Dynamic Regression Models for Survival Data. Springer; Verlag New York: 2006.
  • Petersdorf E, Malkki M, Gooley T, Martin P, Guo Z. MHC haplotype matcing for unrelated hematopoietic cell transplantation. PLOS medicine. 2007;4:59–68.
  • Schaid DJ. Evaluating associations of haplotypes with traits. Genetic Epidemiology. 2004;27:348–364. [PubMed]
  • Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests for association between traits and haplotypes when the linkage phase is ambiguous. American Journal of Human Genetics. 2002;70:425–434. [PubMed]
  • Scheike T, Martinussen T, Silver J. Estimating haplotype effects for survival data. Biometrics. 2010 DOI: 10.1111/j.1541-0420.2009.01329.x. [PubMed]
  • Scheike TH. A flexible semiparametric transformation model for survival data. Lifetime Data Anal. 2006;12:461–80. [PubMed]
  • Tregouet D, Tiret L. Cox proportional hazards survival regression in haplotype-based association analysis using the stochastic-em algorithm. European Journal of Human Genetics. 2004;12:971–974. [PubMed]
  • Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Human Heredity. 2002;53:79–91. [PubMed]
  • Zhao LP, Li SS, Khalid N. A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. American Journal of Human Genetics. 2003;72:1231–1250. [PubMed]
  • Zucker DM. A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J. Amer. Statist. Assoc. 2005;100:1264–1277.