Home | About | Journals | Submit | Contact Us | Français |

**|**Biostatistics**|**PMC3520499

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Mark-specific PH model with multivariate marks
- 3. Goodness-of-fit tests
- 4. Application to a vaccine efficacy trial
- 5. Simulation study
- Supplementary material
- Funding
- Supplementary Material
- References

Authors

Related links

Biostatistics. 2013 January; 14(1): 60–74.

Published online 2012 July 3. doi: 10.1093/biostatistics/kxs022

PMCID: PMC3520499

Yanqing Sun

Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA

Department of Biostatistics, University of Washington, and Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA

Received 2011 December 1; Revised 2012 April 6; Accepted 2012 May 22.

Copyright © The Author 2012. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

This article has been cited by other articles in PMC.

For time-to-event data with finitely many competing risks, the proportional hazards model has been a popular tool for relating the cause-specific outcomes to covariates (Prentice *and others*, 1978. The analysis of failure time in the presence of competing risks. *Biometrics* 34, **541**–**554**). Inspired by previous research in HIV vaccine efficacy trials, the cause of failure is replaced by a continuous mark observed only in subjects who fail. This article studies an extension of this approach to allow a multivariate continuum of competing risks, to better account for the fact that the candidate HIV vaccines tested in efficacy trials have contained multiple HIV sequences, with a purpose to elicit multiple types of immune response that recognize and block different types of HIV viruses. We develop inference for the proportional hazards model in which the regression parameters depend parametrically on the marks, to avoid the curse of dimensionality, and the baseline hazard depends nonparametrically on both time and marks. Goodness-of-fit tests are constructed based on generalized weighted martingale residuals. The finite-sample performance of the proposed methods is examined through extensive simulations. The methods are applied to a vaccine efficacy trial to examine whether and how certain antigens represented inside the vaccine are relevant for protection or anti-protection against the exposing HIVs.

Around 30 years ago, Prentice *and others* (1978) developed the mark-specific proportional hazard (PH) model for discrete marks, and since then a great deal of work has been done in discrete-mark failure time analysis, cf., Kalbfeisch and Prentice (1980), Sun (2001) and Scheike *and others* (2008). Alternatively, Huang and Louis (1998) first considered a continuous mark and developed the nonparametric maximum likelihood estimator of the joint distribution of the failure time and the mark. Motivated by applications in preventative HIV vaccine randomized placebo-controlled efficacy trials, Gilbert *and others* (2004, 2008) developed procedures for testing dependence of mark-specific hazards rates and relative risks (RRs) on the mark variable. Sun *and others* (2009) developed estimation and hypothesis testing methods for the mark-specific PH model with a univariate continuous mark. The mark variable of interest is a measure of the protein sequence distance between an HIV sequence sampled from a volunteer infected in the trial, to an HIV sequence represented inside the tested vaccine construct. As such, the scientific question is if and how the vaccine's effect to reduce the hazard of HIV acquisition depends on the sequence distance mark; answers to this question may guide vaccine development, as discussed extensively in the above-cited papers and elsewhere.

However, the previous work did not account for multivariate marks. This is a serious limitation given that all of the candidate HIV vaccines tested in HIV vaccine efficacy trials have contained multiple antigens/immunogens, with rational to attempt to elicit multiple types of immune response that recognize and block different types of HIV viruses. The greater the number of virus types that can be recognized and killed by vaccine-induced immune responses, the greater the potential overall vaccine efficacy. In the first two efficacy trials, the HIV vaccine construct contained two Envelope (Env) protein antigens, based on two distinct strains of HIV, such that a two-dimensional mark variable is of interest (Flynn *and others*, 2005; Pitisuttithum *and others*, 2006). The vaccine construct evaluated in the third and fourth efficacy trials contained Gag, Pol, and Nef protein antigens, making a three-dimensional mark variable of interest (Buchbinder *and others*, 2008; Gray *and others*, 2009). Lastly, the fifth and most recent efficacy trial tested a vaccine that contained Gag, Pol, and Nef protein antigens, as well as three distinct Env protein antigens, making a six-dimensional mark variable of interest (Rerks-Ngarm *and others*, 2009).

The previous work dealt with the multivariate mark issue by collapsing the multiple distances from an infected subject into a univariate mark—the minimum of the distances to each vaccine antigen. This approach is reasonable under the premise that the only thing that matters for protection is the nearness of the exposing HIV to the closest antigen represented inside of the vaccine (e.g. Gilbert *and others*, 2008). However, there are many ways in which this assumption may fail. For example, based on host genetics (e.g. HLA type or Fc-*γ*-receptor type), one subgroup may be protected through immune responses that recognize HIV peptides that are similar to HIV peptides represented in antigen 1, whereas another subgroup may be protected through immune responses that are similar to HIV peptides represented in antigen 2; in this case, the vaccine efficacy depends on both individual distances and less so on the minimum distance. For a second example, there are many ways to define a protein sequence distance to be putatively immunologically relevant [several distances were used in the sequence analysis of the Buchbinder *and others* (2008) efficacy trial reported in Rolland *and others* (2011)], and if two distances are used such that the first considers many HIV sites irrelevant for protection whereas the second sagely restricts attention to key HIV sites that are contained in epitopes that cause protection, then the first distance could be shorter even though vaccine efficacy depends only on the second. Therefore, a more general approach to assessing and modeling how vaccine efficacy depends on multiple sequence distances is needed, which does not pre-assume a particular way to collapse the multivariate distances into a univariate distance. Outside of the survival analysis field, Gilbert (2000) studied such a general approach with multivariate marks, based on a semiparametric biased sampling model. However, this method is limited by the fact that the model conditions on infection, so that conditional odds ratios but not prospective RRs of infection can be estimated, and by the fact that the model treats HIV infection as a binary outcome, not accounting for the time to HIV infection.

Let *λ*(*t*,*v*|*z*) be the conditional mark-specific hazard function, defined as , where *T* is the failure time, *V* is a continuous mark variable, and *Z*(*t*) is a time-dependent *p*-dimensional covariate. Sun *and others* (2009) studied the mark-specific PH model

(1.1)

for evaluating the mark-specific vaccine efficacy. Under this model, the ratio of hazard functions for any two individuals is a function of the mark independent of time. This assumption may not always be met in practice, and can be relaxed through stratification (Dabrowska, 1997). Here we propose to study the stratified mark-specific PH model with multivariate marks where the baseline hazard function can vary with stratum. Under the stratified mark-specific PH model, the ratio of hazard functions for any two individuals within same stratum is independent of time. However, it may be time dependent for two individuals from different strata. For simplicity, we consider a two-dimensional mark variable *v*=(*v*_{1},*v*_{2}). The methods for general multivariate mark follow a similar outline. Note that the mark-specific RR function *β*(*v*)=*β*(*v*_{1},*v*_{2}) is a *p*-dimensional function depending on *v*. Estimation of *β*(*v*) without any structural assumptions is possible following the procedure for one-dimensional mark variable by Sun *and others* (2009). However, this would require a very large sample size due to the curse of dimensionality. In this paper, we consider a parametric model for *β*(*v*) as the first-order Taylor approximation of *β*(*v*) plus an interaction term that allows investigation of the effects of both marks as well as the interactions. We develop an estimation procedure for the stratified mark-specific PH model under this parametric structure for *β*(*v*). We also develop some testing procedures examining how the vaccine efficacy depends on the marks and how relevant the two marks are for protection against HIV infection. The goodness of fit of the proposed model is tested based on the generalized weighted martingale residuals.

The rest of the paper is organized as follows. In Section 2, we develop a partial likelihood procedure for estimating *β*(*v*). The asymptotic properties of the proposed estimator are derived. Likelihood-based statistical tests are proposed to examine a variety of null hypotheses to understand whether and how the vaccine efficacy depends on the marks. In Section 3, we propose a procedure for testing goodness of fit of the multivariate mark-specific proportional hazard model based on the generalized weighted martingale residuals. The proposed methods are applied to a vaccine efficacy trial in Section 4. Extensive simulations are conducted to examine the finite sample performance of the proposed methods in Section 5. Proofs of the main results are placed in the supplementary material available at *Biostatistics* online.

Let *λ*_{k}(*t*,*v*|*z*) be the conditional mark-specific hazard function at time *t* with mark *v* for a subject in stratum *k* with covariate *z* (Sun *and others*, 2009). We assume that the mark *V* takes value in the interval [0,1]^{2}, rescaled if necessary. The stratified multivariate mark-specific PH model speculates that

(2.1)

for *k*=1,2,…,*K*, where *K* is the number of the strata, *λ*_{0k}(*t*,*v*) is an unspecified baseline mark-specific hazards function for *k*th stratum, *z* is a *p*-dimensional covariate, and *β*(*v*) is a *p*-dimensional vector of regression functions. Here we assume that *β*(*v*) is given by

(2.2)

where each of *β*_{0},*β*_{1},*β*_{2}, and *β*_{12} is a *p*-dimensional vector. Thus *β*(*v*) is completely specified by the 4×*p* parameters denoted by . Let *z*=(*z*_{1},*z*_{2}(*t*))^{T}, where *z*_{1} is the vaccine group indicator, with *z*_{1}=1 for the vaccine group and *z*_{1}=0 for the placebo group, while *z*_{2}(*t*) is other possibly time-dependent covariates. Under the mark-specific PH model (2.1), the mark-specific RR is , where *b*_{1}(*v*) is the coefficient corresponding *z*_{1}. The mark-specific vaccine efficacy *VE*(*v*)=1−*RR*(*v*) represents the percentage reduction for the vaccine versus placebo in risk of failure at time *t* with mark *v* after adjusting for covariates *z*_{2}. It does not depend on time *t* under model (2.1).

Let *T*_{k} be the failure time for an individual in the *k*th stratum, *V*
_{k} be the mark observed at failure and *Z*_{k} be the associated *p*-dimensional covariate. Assume that (*T*_{k},*V*
_{k},*Z*_{k}) follows model (2.1). Under right censoring, the observed random variables are (*X*_{k},*δ*_{k},*δ*_{k}*V*
_{k},*Z*_{k}), where , *δ*_{k}=*I*(*T*_{k}≤*C*_{k}) and *C*_{k} is the censoring random variable, which is assumed to be independent of (*T*_{k},*V*
_{k}) given *Z*_{k}. The mark variable *V*
_{k} is observed if the failure time is uncensored. Let (*X*_{ki},*δ*_{ki},*δ*_{ki}*V*
_{ki},*Z*_{ki}), *i*=1,…,*n*_{k}, be independent identically distributed (iid) replicates of (*X*_{k},*δ*_{k},*δ*_{k}*V*
_{k},*Z*_{k}) from stratum *k*, for *k*=1,2,…,*K*. Let *τ* be the end of follow-up time, where failure times beyond *τ* are considered censored. The estimation of model (2.1) for *β*(*v*) under parametric structure (2.2) is given next.

Let *N*_{ki}(*t*,*v*)=*I*(*X*_{ki}≤*t*,*V*
_{ki}≤*v*,*δ*_{ki}=1) be the marked counting process for *i*th individual in stratum *k*, where the relation *V*
_{ki}≤*v* indicates that the inequality holds for each component of the multivariate mark. Let *Y*
_{ki}(*t*)=*I*(*X*_{ki}≥*t*) be the at-risk process. Under (2.2), *β*(*v*) is completely specified by . Let and with being the Kronecker product. Similar to Kalbfleisch and Prentice (1980) for the competing risks model with finite number of causes, the log-partial likelihood function for can be expressed as

(2.3)

Since , we have

(2.4)

The maximum partial likelihood estimator (MPLE) for is obtained by maximizing given in (2.4). Taking the derivative of with respect to , the score function can be written as

(2.5)

where for *j*=0,1,2. Here for a vector *z*, *z*^{0}=1, *z*^{1}=*z*, and *z*^{2}=*zz*^{T}. Let . The information matrix is the derivative of with respect to given as

(2.6)

The MPLE for is the solution to . Under condition 1 in Section 2.3, the matrix is positive definite with probability 1 as for 1≤*k*≤*K*. Thus exists almost surely and is unique for large sample sizes. The baseline function *λ*_{0k}(*t*,*v*) can be estimated by smoothing the increments of the following estimator of the doubly cumulative baseline function :

(2.7)

where . A kernel estimator of *λ*_{0k}(*t*,*v*) is given by where and , *K*^{(1)}() and *K*^{(2)}() are kernel functions, and *h*_{1} and *h*_{2} are the corresponding bandwidths. The cause-related cumulative incidence function *P*(*T*_{k}≤*t*,*V*
_{k}≤*v*|*Z*_{k})= can be estimated by where and .

Let be the true value of under models (2.1) and (2.2). Let , for *j*=0,1,2, and . Let . We make use of the following regularity conditions.

The covariate process *Z*_{k}(*t*) is left continuous with bounded variation and satisfies the moment condition , where is the Euclidean norm and M is a positive constant such that ( for 1≤*k*≤*K*.

For 1≤*k*≤*K*, *λ*_{0k}(*t*,*v*) is continuous on [0,*τ*]×[0,1]^{2}, and each component of is continuous on [0,*τ*]×[0,1]^{2}×, where is an neighborhood of .

The limit *n*_{k}/*n*→*p*_{k} exists as for 0<*p*_{k}<1 for 1≤*k*≤*K*. The matrix is positive definite for .

The asymptotic results for and , 1≤*k*≤*K*, are presented in the following theorems.

*Under conditions 1–3,*
*converges in probability to*
*as*
*.*

*Under conditions 1–3,*
*as*
*, where*
*can be consistently estimated by*
*.*

*Under conditions 1–3, the following decomposition holds uniformly in (t,v)[0,τ]×[0,1] for 1≤k≤K as*
*:
*

*is asymptotically independent of the processes*
*, k=1,…,K, with the latter being asymptotically independent mean-zero Guassian random fields with variances*
*and with independent increments.*

We propose some statistical tests for evaluating whether and how the vaccine efficacy depends on the marks. The following null hypotheses are examined: *H*_{10}:*β*_{1}=*β*_{2}=*β*_{12}=0; *H*_{20}:*β*_{12}=0; *H*_{30}:*β*_{2}=*β*_{12}=0 and *H*_{40}:*β*_{1}=*β*_{12}=0. The null hypothesis *H*_{10} indicates that the RRs do not depend on the marks; *H*_{20} implies that the marks *v*_{1} and *v*_{2} do not have interactive effects on RRs; *H*_{30} implies that RRs are not affected by *v*_{2}; while *H*_{40} implies that RRs are not affected by *v*_{1}.

Likelihood-based tests such as the likelihood ratio test (LRT), Wald test, and score test are commonly used in the parametric settings. Here we adopt these tests for model (2.1) with *β*(*v*) having the parametric structure (2.2). The tests are constructed based on the log-partial likelihood function given in (2.4). Let be the MPLE maximizing . Denote *H*_{0} as one of the null hypotheses *H*_{10}, *H*_{20}, or *H*_{30}. Let be the estimator of under *H*_{0}, which is the maximizer of under *H*_{0}. For example, for *H*_{10}, is the maximizer of under the restriction *β*_{1}=*β*_{2}=*β*_{12}=0. The LRT statistic is . The Wald test statistic is given by where the information matrix is defined in (2.6). The score test statistic is given by where the score function and information matrix are defined in (2.5) and (2.6), respectively.

Routine analysis following Serfling (1980) shows that under *H*_{0}, *T*_{l}, *T*_{w}, and *T*_{s} converge in distribution to a chi-square distribution with degrees of freedom equal to the number of parameters specified under *H*_{0}. The LRT rejects *H*_{10} if , the upper *α* quantile of the chi-square distribution with 3*p* degrees of freedom. The corresponding critical values for testing *H*_{20}, *H*_{30}, and *H*_{40} are , , and , respectively. Similar decision rules hold for the Wald test with test statistic *T*_{w} and the score test with test statistic *T*_{s}. The parameters *β*_{1},*β*_{2}, and *β*_{12} specified in the null hypotheses are *p*-dimensional vectors. The tests can be carried out for the hypotheses that only concern a single component of the coefficients to examine how one particular covariate effect is modified by the marks.

The estimation and testing procedures developed in Section 2 are developed under model (2.1) with *β*(*v*) having the parametric structure (2.2). The validity of these procedures depends on goodness of fit of the multivariate mark-specific proportional hazard model. This section develops some goodness-of-fit tests of model (2.1) under the parametric structure (2.2) for *β*(*v*) following Lin *and others* (1993). Let . We derive the model checking test statistics based on the cumulative martingale residuals defined as

(3.1)

where is the MPLE given in Section 2, and is defined in (2.7). may be interpreted as the difference up to time *t* between the observed and the predicted number of events with marks at failures less than *v* for the *i*th subject in *k*th stratum. Thus the martingale residuals are informative about model misspecifications. It is easy to check that .

For 1≤*k*≤*K*, let

(3.2)

where *g*_{k}(*Z*_{ki},*z*) is a 1×*r*-vector of known bounded functions of *Z*_{ki} and z. For example, one may take *g*_{k}(*Z*_{ki},*z*)=*f*_{k}(*Z*_{ki})*I*(*Z*_{ki}≤*z*), where *f*_{k}() is a known function and *I*(*Z*_{ki}≤*z*)={*I*(*Z*_{1ki}≤*z*_{1}),…,*I*(*Z*_{pki}≤*z*_{p})}, in which case *r*=*p*. We construct goodness-of-fit test statistics based on the test process *W*(*t*,*v*,*z*)=(*W*_{1}(*t*,*v*,*z*),…,*W*_{K}(*t*,*v*,*z*)). If models (2.1) and (2.2) hold, the process *W*(*t*,*v*,*z*) fluctuates randomly about zero. Various test statistics can be constructed by selecting different weight functions *f*_{k}() and using different functionals of the process *W*(*t*,*v*,*z*). Here we propose the supremum test statistics to test the overall fit of the model:

(3.3)

The distribution of *S* under the null hypothesis is complicated and intractable. To calculate the critical value of the proposed test statistic, we consider using the Guassian multiplier method (Lin *and others*, 1993) that can be applied to approximate the distribution of the process *W*(*t*,*v*,*z*). The key step toward the application of this method is to approximate *W*(*t*,*v*,*z*) with the sum of iid processes as shown next.

Let and , where *A**B* is the Kronecker product of matrices *A* and *B*. Let and . The proof of the following decomposition is given in the supplementary material at *Biostatistics* online.

*Assume conditions 1–3. For 1≤k≤K, we have
*

(3.4)

*as*
*where
*

Expression (3.4) shows that the process *W*_{k}(*t*,*v*,*z*) is asymptotically equivalent to the sum of iid terms involving the integrations with respect to *M*_{li}(*s*,*u*). Donsker's theorem (cf., van der Vaart, 1998) on the weak convergence of empirical processes can be used to show that the process *W*(*t*,*v*,*z*)=(*W*_{1}(*t*,*v*,*z*),…,*W*_{K}(*t*,*v*,*z*)) converges weakly to a multi-dimensional Gaussian random field.

Let {*ξ*_{li},*i*=1,…,*n*_{l},*l*=1,…,*K*} be iid standard normal random variables. For 1≤*k*≤*K*, let

(3.5)

where

Following an application of Sun and Wu (2005, Lemma 1), the distribution of *W*(*t*,*v*,*z*) can be approximated by the conditional distribution of given the observed data sequence. The continuous mapping theorem entails that the distribution of *S* can be approximated by the empirical distribution of obtained through repeatedly generating independent sets of {*ξ*_{li},*i*=1,…,*n*_{l},*l*=1,…,*K*} while holding the observed data fixed. We reject model (2.1) under the parametric structure (2.2) for *β*(*v*) at the significance level *α* if *S* is greater than the critical value *S*_{α}—the upper *α* quantile of the copies of *S** described above.

We now illustrate our methods with the analysis of data from a vaccine efficacy trial conducted in North and South America (Buchbinder *and others*, 2008). The ‘Step’ trial randomized 1836 HIV-negative men to receive either the Merck Adenovirus 5 (Ad5) vaccine (MRKAd5) or placebo. Women were also enrolled, but only one became HIV infected. Of the 1836 men, 87 acquired HIV infection. The sequencing labs attempted to derive single-genome-amplification HIV sequences from all 87 subjects, and were only successful for 65. The 22 men with no sequence data were excluded from the analysis, and therefore for the method to provide consistent estimation the missing completely at random (MCAR) assumption is needed. While MCAR seems reasonable in the application (partial support comes from an analysis of the predictors of missing sequence data, which uncovered no significant predictors), it cannot be assured. Sun and Gilbert (2012) developed an augmented inverse probability-weighted approach to provide consistent estimation and inference in the mark-specific proportional hazards model under a missing at random (MAR) assumption, but this method considered only a univariate mark variable. It would be of interest to extend their method to the multivariate mark setting. Alternatively, it would be of interest to extend the multiple imputation approach of Lu and Tsiatis (2001) from the discrete mark setting to the continuous mark setting. The randomization was stratified by whether a volunteer had pre-existing immunity to the Ad5 vector that was used in the vaccine, defined by an Ad5 neutralization titer greater than 200. The method is implemented accounting for these two strata, where 1058 (778) volunteers had Ad5 below (above) 200. In the data analysis, we focus on men only.

For those with Ad5 ≤200, there were 54 total infections: 29 of 522 in the vaccine group (with 7 missing marks) and 25 of 536 in the placebo group (with 8 missing marks). The annual incidences were 5.6% for the vaccine group and 4.7% for the placebo group. For Ad5>200, there were 33 total infections: 24 of 392 in the vaccine group (with 7 missing marks) and 9 of 386 infections in the placebo group (with 0 missing marks). The annual incidences were 6.1% for the vaccine group and 2.3% for the placebo group.

Two factors motivate the need to consider multiple protein sequence distances for the sieve analysis (Gilbert, 2000). First, the MRKAd5 vaccine contained three HIV-1 proteins: Gag, Pol, and Nef. Each of these proteins stimulates immune responses in vaccine recipients that could plausibly make the vaccine effect on infection vary with genetic distance to the protein. As a control, it is also of interest to evaluate genetic distances to all of the HIV-1 vaccine proteins combined not included in the vaccine, Env-Rev-Tat-Vif-Vpr-Vpu, for which the central HXB2 reference strain was used. Because the vaccine should not be able to stimulate immune responses to these proteins, it is expected that the vaccine effect on infection would not vary with this control genetic distance. Secondly, as described in greater detail in the clinical paper (Rolland *and others*, 2011), two different bioinformatics methods, NetMHC (Buus *and others*, 2003) and Epipred (Heckerman *and others*, 2007), were used to predict, for each infected subject, the set of HIV peptides in the reference sequence that could potentially bind to at least one of his/her HLA alleles. These peptides are the ones that could potentially constitute T-cell epitopes and hence contribute to a vaccine effect on infection.

NetMHC predicts quantitative binding of peptides to four-digit HLA alleles, whereas Epipred identifies known and potential HIV-1 CTL epitope motifs using two-digit HLA information. In addition to the known HLA-restricted epitopes previously reported at the Los Alamos National Laboratory HIV database (HIVDB), we accepted all epitope motifs with a posterior probability of >0.8. HLA-specific epitopes were predicted in the Gag-Pol-Nef protein sequence contained in the tested vaccine (the ‘Step’ reference sequence) and in all proteins from the HXB2 reference sequence (available at the HIVDB, http:www.hiv.lanl.gov). Using Epipred, the first step in computing the distance between a subject's sequences and the reference sequence is to compute the nonparametric maximum likelihood estimate (NPMLE) of the number of peptides shared between the reference sequence and the subject's sequences, defined as the sum of estimated epitope-probabilities across all 8-, 9-, 10-, 11-mers in the reference sequence that are exactly matched in all of the subject's sequences. Then, the distance is the NPMLE of the percent of peptides mismatched in at least one of the subject's sequences, defined as one minus the ratio of the NPMLE of the number of shared peptides (computed in the first step) and the NPMLE of the number of peptides in the reference sequence. Because the NetMHC software categorizes quantitative binding readouts as nonbinder, weak binder, or strong binder, we defined the distance as the estimated percent of epitopes mismatched in at least one of the subject's sequences, the latter defined as the number of weak or strong binding 8-, 9-, 10-, or 11-mers in the reference sequence that mismatch the corresponding peptide in at least one of the subject's sequences.

We consider distances defined using the reference HIV-1 regions Gag, Pol, Nef, Env-Rev-Tat-Vif-Vpr-Vpu, as well as using the two bioinformatics prediction methods. The five selected marks, {*v*_{j},*j*=1,…,5}, are listed in Table 1. We analyze the data with the mark-specific proportional hazards model (2.1), where the covariate *z* is the indicator for the treatment, with *z*=1 for the vaccine group and 0 for placebo group. The *K*=2 strata are the Ad5≤200 and Ad5>200 groups. We use the standardized marks in the analysis, which are obtained by subtracting the minimum and dividing the range of the measurements for each mark. We perform the backwards model selection procedure by starting the analysis with all five marks in the model (2.1) with . The marks with nonsignificant coefficients are eliminated one at a time, each time the mark with the least significant *p*-value is eliminated. The backward selection procedure retains the last two marks *v*_{1} and *v*_{2}.

For the remaining two marks *v*_{1} and *v*_{2}, we fit the model (2.1) with *β*(*v*)=*β*_{0}+*β*_{1}*v*_{1}+*β*_{2}*v*_{2}+*β*_{12}*v*_{1}*v*_{2}. The goodness of fit of model (2.1) with *β*(*v*) given by (2.2) is conducted following the procedure given in Section 3 with the test statistic . We set *f*_{k}(*z*)=1 and *K*=2. The *p*-value is calculated as the percentage of 500 copies of greater than *S*. The *p*-value for the bivariate mark selection is 0.56. These result supports that the model fit is adequate. The estimates are (3.43,−3.40,−10.61,13.90). The standard errors of are (1.52,2.96,3.63,5.79). Table 2 shows the *p*-values of the LRT, Wald test (Wald), and score test (Score) for testing *H*_{10}–*H*_{40} described in Section 2.4. All four hypotheses are rejected with borderline significance level near 0.05, with consistent *p*-values across the three tests.

The p-values of the LRT, Wald test, and score test for testing *H*_{10}, *H*_{20}, and *H*_{30} for the bivariate marks (*v*_{1},*v*_{2}) for the step data

Because the overall infection rate was higher in the vaccine than placebo group (Cox model RR = 1.50, 95% CI 0.95–2.41, *p*=0.06), we estimate the mark-specific RR as RR(*v*_{1},*v*_{2})=exp(*β*_{1}(*v*_{1},*v*_{2})) instead of the mark-specific vaccine efficacy; thus for the step trial the analysis focuses on whether and how the vaccine selectively increased the susceptibility to HIV infection. Based on the *β* point estimates, the estimated RR, (its standard error), equals 3.27 (3.14), 0.91 (1.63), and 3.08 (11.21) for (*v*_{1},*v*_{2}) = (0.2,0.2), (0.5, 0.5), and (0.8, 0.8), respectively. Hence the 95% confidence intervals for *RR*(*v*_{1},*v*_{2}) is (−2.88,9.42), (−2.28,4.10), and (−18.89,25.05) at (*v*_{1},*v*_{2}) = (0.2,0.2), (0.5, 0.5), and (0.8, 0.8), respectively. The interpretation is that the vaccine had no effect on infection against HIV-1 when both distances are in the range from 0.2 to 0.8.

In this section, we conduct a simulation study to examine finite sample properties of the proposed estimation and hypothesis testing procedures. The simulations are set up to mimic the step data from the HIV vaccine efficacy trial conducted in North and South America (Buchbinder *and others*, 2008). We take *K*=2 strata corresponding to two levels of neutralization titer Ad5. Two sample sizes *n*_{k}=250 and 400 are considered for each stratum. The baseline hazard functions *λ*_{0k}(*t*,*v*) are set as constants *λ*_{0k}, with *λ*_{01}=0.4 and *λ*_{02}=0.6. The covariate *Z*_{k} is a Bernoulli random variable with *Z*_{k}=1 corresponding to the vaccine group and *Z*_{k}=0 for the placebo group. We take *P*(*Z*_{k}=1)=0.5. The censoring time *C*_{k} follows an exponential distribution, independent of (*T*_{k},*V*
_{k}). We consider the following selections of (*β*_{0},*β*_{1},*β*_{2},*β*_{12}) under (2.1) and (2.2) to examine the proposed estimation and hypothesis testing procedures:

Here *M*_{10} is a model under the null hypothesis *H*_{10} which states that *β*(*v*) does not depend on either mark *v*_{1} or *v*_{2}. Under *M*_{11}, the mark-specific RR depends on both marks but not the interaction of these marks. Under *M*_{12}, the mark-specific RR depends on both marks and their interaction. *M*_{20} is a model under the null hypothesis *H*_{20} where only *β*_{12} is zero, and *M*_{30} is a model under the null hypothesis *H*_{30} where *β*(*v*) does not depend on *v*_{2}. The overall RRs of the vaccine versus the placebo in most of the settings are around 50%. The percentage of censoring in all settings is about 70% at the end of follow-up *τ*=2.

Table 3 lists the bias, the sample standard error of the estimates (SSE), the average of the estimated standard errors (ESE), and the coverage probability (CP) for *β* for sample size *n*_{k}=250 and 400. Table 3 shows that the biases of the estimators are small, ESE approximates SSE well, and the coverage probabilities are very close to their nominal level of 0.95. Table 4 summarizes the empirical sizes and powers of the LRT, Wald test, and the score test for testing *H*_{10},*H*_{20}, and *H*_{30} at the significance level *α*=0.05. The average numbers of cases for the vaccine (Vx) and placebo (Plb) groups for are also reported. Each entry of Tables 3 and 4 is calculated based on 1000 simulations. Table 4 shows that the empirical sizes of all these tests are close to 0.05. The LRT has better power than the Wald and score tests. The powers of the tests increase as the simulation model moves from *M*_{11} to *M*_{12}, representing an increased departure from *H*_{10}. Similarly, the powers of these tests for testing *H*_{20} increase in the direction from *M*_{21} to *M*_{22}. The powers for testing *H*_{30} also increase in the direction from *M*_{31} to *M*_{32}, and the powers increase as the sample size increases. The coverage probabilities for *β*_{0}, *β*_{1}, *β*_{2}, and *β*_{12} are also listed to demonstrate that the proposed maximum partial likelihood methods work very well.

Summary statistics and coverage probabilities of the 95% confidence intervals for (*β*_{0},*β*_{1},*β*_{2},*β*_{12}) under models *M*_{10}, *M*_{20}, and *M*_{30}

In this section, we conduct a simulation study to check the finite sample performance of the proposed goodness-of-fit testing procedure based on the test statistic *S*=sup_{1≤k≤K}sup_{t,v,z}|*W*_{k}(*t*,*v*,*z*)| defined in (3.3). We set *f*_{k}(*z*)=1 and *K*=2. The size of the test is examined using the following simple stratified mark-specific proportional hazards model:

(5.1)

where *λ*_{01}=0.4, *λ*_{02}=0.6 and (*β*_{0},*β*_{1},*β*_{2},*β*_{12})=(−1.65,0.4,0.9,1.2). The covariate *Z*_{ki} is a Bernoulli random variable with *P*(*Z*_{ki}=1)=0.5. The censoring times are generated from an exponential distribution, independent of (*T*_{ki},*V*
_{ki}). The first block in Table 5 shows the empirical sizes of the test for sample sizes of *n*=*n*_{1}+*n*_{2}=100, 200, and 300. The empirical sizes are calculated based on 1000 simulations and 500 Gaussian multiplier samples, and they are very close to the 0.05 nominal level.

To evaluate the power of proposed test, we consider the model

(5.2)

where *λ*_{0k}(*t*,*v*)=*abkt*^{(b−1)}exp{*at*^{b}−0.4*v*_{1}−0.6*v*_{2}}, and *a* and *b* are the parameters whose values are to be selected. Again we take *Z*_{ki} as a Bernoulli random variable with *P*(*Z*_{ki}=1)=0.5. Model (4.1) is not a mark-specific proportional hazards model since the relative hazard ratio *λ*_{k}(*t*,*v*|*z*=1)/*λ*_{k}(*t*,*v*|*z*=0)=exp{*at*^{b}−0.4*v*_{1}−0.6*v*_{2}} changes with time. We consider the alternative models specified by the choices of (a,b), defined as *M*_{41}:(*a*,*b*)=(0.20,0.30); *M*_{42}:(*a*,*b*)=(0.25,0.40); and *M*_{43}:(*a*,*b*)=(0.30,0.50). As *a* and *b* increase, the rate of change with respect to *t* of the relative hazard ratio increases, which represents an increased departure from the null hypothesis. For each of the models *M*_{41}, *M*_{42}, and *M*_{43}, random right censoring times are generated from an exponential distribution, independent of (*T*_{ki},*V*
_{ki}). The empirical powers of the test at the significance level 0.05 under the models *M*_{41}, *M*_{42}, and *M*_{43} at sample sizes *n*=*n*_{1}+*n*_{2}=100, 200, and 300 are given in Table 5. Each entry of the table is based on 1000 simulations and 500 Gaussian multiplier samples. The power of the test increases with increasing (*a*,*b*), and also increases with sample size. The limited simulation study demonstrates the validity of the proposed goodness-of-fit testing procedure. The test provides a valuable tool to check the adequacy of the stratified mark-specific proportional hazards model specified in (2.1) and (2.2).

Supplementary material is available at http://biostatistics.oxfordjournals.org.

This research was partially supported by the National Institutes of Health (grant number R37 AI054165-09). The research of Y.S. was also partially supported by the National Science Foundation (grant number DMS-0905777).

The authors thank the referees for their valuable inputs that have improve the paper. They also thank the HIV Vaccine Trials Network (HVTN) and Merck for providing the data analyzed in this article. The HVTN is supported through a cooperative agreement with the National Institutes of Health Division of AIDS. *Conflict of Interest*: None declared.

- Buchbinder S. P., Mehrotra D. V., Duerr A., Fitzgerald D. W., Mogg R., Li D., Gilbert P. B., Lama J. R., Marmor M., Del Rio C. Efficacy assessment of a cell-mediated immunity HIV-1 vaccine (the Step Study): a double-blind, randomised, placebo-controlled, test-of-concept trial. Lancet. 2008;372:1881–1893. [PMC free article] [PubMed]
- Buus S., Lauemoller S. L., Worning P., Kesmir C., Frimurer T., Corbet S., Fomsgaard A., Hilden J., Holm A., Brunak S. Sensitive quantitative predictions of peptide-MHC binding by a Query by Committee artificial neural network approach. Tissue Antigens. 2003;62:378–384. [PubMed]
- Dabrowska D. M. Smoothed Cox regression. The Annals of Statistics. 1997;25:1510–1540.
- Flynn N. M., Forthal D. N., Harro C. D., Judson F. N., Mayer K. H., Para M. F. Placebo-controlled trial of a recombinant glycoprotein 120 vaccine to prevent HIV infection. Journal of Infectious Diseases. 2005;191:654–665. [PubMed]
- Gilbert P. B. Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Annals of Statistics. 2000;28:151–194. the rgp120 HIV Vaccine Study Group.
- Gilbert P. B., McKeague I. W., Sun Y. Tests for comparing mark-specific hazards and cumulative incidence functions. Lifetime Data Analysis. 2004;10:5–28. [PubMed]
- Gilbert P. B., McKeague I. W., Sun Y. The two-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics. 2008;9:263–276. [PubMed]
- Gray G. E., Bekker L., Churchyard G. J., Nchabeleng M., Mlisana K., de Bruyn G., Roux S., Mathebula M., Latka M., Bennie T. Did unblinding affect HIV risk behaviour and risk perception in the HVTN503/Phambili study? Retrovirology. 2009;6 (Suppl 3):209.
- Heckerman D., Kadie C., Listgarten J. Leveraging information across HLA alleles/supertypes improves epitope prediction. Journal of Computational Biology. 2007;14:736–746. [PubMed]
- Huang Y., Louis T. A. Nonparametric estimation of the joint distribution of survival time and mark variables. Biometrika. 1998;85:785–798.
- Kalbfeisch J. D., Prentice R. L. The Statistical Analysis of Failure Time Data. New York: Wiley; 1980.
- Lin D. Y., Wei L. J., Ying Z. Checking the Cox model with cumulative sums of Martingale-based residuals. Biometrika. 1993;80:557–572.
- Lu K., Tsiatis A. A. Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics. 2001;57:1191–1197. [PubMed]
- Pitisuttithum P., Gilbert P., Gurwith M., Heyward W., Martin M., van Griensven F., Hu D., Tappero J. W., Choopanya K. Randomized, double-blind, placebo-controlled efficacy trial of a bivalent recombinant glycoprotein 120 HIV-1 vaccine among injection drug users in Bangkok, Thailand. Journal of Infectious Diseases. 2006;194:1661–1671. [PubMed]
- Prentice R. L., Kalbfleisch J. D., Peterson A. V., Fluornoy N., Farewell V. T., Breslow N. E. The analysis of failure time in the presence of competing risks. Biometrics. 1978;34:541–554. [PubMed]
- Rerks-Ngarm S., Pitisuttithum P., Nitayaphan S., Kaewkungwal J., Chiu J., Paris R., Premsri N., Namwat C., de Souza M., Adams E. Vaccination with ALVAC and AIDSVAX to prevent HIV-1 infection in Thailand. The New England Journal of Medicine. 2009;361:2209–2220. [PubMed]
- Rolland M., Tovanabutra S., deCamp A. C., Frahm N., Gilbert P., Sanders-Buell E., Heath L., Magaret C. A., Bose M., Bradfield A. Genetic impact of vaccination on breakthrough HIV-1 sequences from the Step trial. Nature Medicine. 2011;17:366–371. [PMC free article] [PubMed]
- Scheike T., Zhang M., Gerds T. A. Predicting cumulative incidence probability by direct binomial regression. Biometrika. 2008;95:205–220.
- Serfling R. J. Approximation Theorems of Mathematical Statistics. New York: Wiley; 1980.
- Sun Y. Generalized nonparametric test procedures for comparing multiple cause-specific hazard rates. Journal of Nonparametric Statistics. 2001;13:171–207.
- Sun Y., Gilbert P. B. Estimation of stratified mark-specific proportional hazards models with missing marks. Scandinavian Journal of Statistics. 2012;39:34–52. [PMC free article] [PubMed]
- Sun Y., Gilbert P. B., McKeague I. W. Proportional hazards models with continuous marks. The Annals of Statistics. 2009;37:394–426. [PMC free article] [PubMed]
- Sun Y., Wu H. Semiparametric time-varying coefficients regression model for longitudinal data. Scandinavian Journal of Statistics. 2005;32:21–47.
- van der Vaart A. W. Asymptotic Statistics. Cambridge: Cambridge University Press; 1998.

Articles from Biostatistics (Oxford, England) are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |