|Home | About | Journals | Submit | Contact Us | Français|
This paper presents methods for testing covariate effect in the Cox proportional hazards Model based on Kullback-Leibler divergence and Renyi’s information measure. Renyi’s measure is referred to as the information divergence of order γ (γ ≠ 1) between two distributions. In the limiting case γ → 1, Renyi’s measure becomes Kullback-Leibler divergence. In our case, the distributions correspond to the baseline and one possibly due to a covariate effect. Our proposed statistics are simple transformations of the parameter vector in the Cox proportional hazards model, and are compared with the Wald, likelihood ratio and Score tests that are widely used in practice. Finally, the methods are illustrated using two real-life data sets.
The Cox proportional hazards (PH) model (Cox, 1972) offers a method for exploring the association of covariates with the survival time outcome variable often seen in medical and engineering studies. The observed data consist of independent observations on the triple (X, δ, z), where X is the minimum of a survival and censoring time pair (T, C), δ = I(T ≤ C) is the indicator of the event that a death has been observed and z = (z1, z2, …, zp)′ is a p–vector of covariates. The non-negative random variables T and C denote the survival and censoring times respectively, and are assumed to be independent conditional on the covariates.
The Cox PH model is perhaps most widely used in censored survival data analysis. For a given vector of observed covariates z, the hazard function of T is modeled as
where λ0(t) is an unspecified function of time and β′ = (β1, ··· βp) is a row vector of unknown regression parameters. Alternate forms of the Cox PH model include
Checking for covariate effect in (1.1) is a matter of great interest. Several procedures have been described that include the Wald, likelihood ratio (LR) and Score tests among others (see Klein and Moeschberger (2003)). In this paper, we develop tests for covariate effect in the Cox PH model (1.1) that are easy to compute as well as perform better than these statistical tests. Our tests are based on Kullback-Leibler (KL) divergence as well as Renyi’s divergence. In Section 2, we derive expressions for our test statistics based on these measures. In Section 3, we discuss the implementation of the proposed tests and give two illustrative examples. Section 4 compares the performance of the proposed tests with existing tests in terms of power and consistency via extensive simulation studies. Section 5 provides some concluding remarks.
In this case, our null hypothesis of interest is non-parametric and is given by
The alternative is given by
where the terms are as defined earlier. We observe that the null hypothesis corresponds to β = 0 indicating no covariate effect. We can now re-write our hypotheses as
In general, the KL divergence between two distribution functions F and F0 is
where f and f0 are the density functions corresponding to F amd F0 respectively. I(F: F0) is also known as the directed divergence and measures the discrepancy between F and F0 in the direction of F. It is well known that I(F: F0) ≥ 0 and the equality holds if and only if F = F0. Similarly, one can also define the directed divergence
measuring the difficulty of discriminating between F0 and F.
Using (1.2), we can write the density functions under H1 and H0 as
and the directed divergence I2(F0: F |z) given covariate vector z, is given by
The divergence measure J(F: F0|z) given the covariate vector z is given by
Another information measure that one may use to discriminate between H0 and H1 is Renyi’s measure. This measure generalizes KL divergence and is referred to as the information divergence of order γ between two distributions F and F0 and corresponding densities f and f0. Conditional on the covariate vector z, this measure is defined as
where γ is the order of information and γ ≠ 1. In the limit γ →1, (2.9) and (2.12) coincide. Information divergence of order γ, Rγ(F: F0|z), is symmetric only for the case , i.e., . This is better known as the Bhattacharya distance. For our application, we consider several values of γ in the interval (0, 1). For more details about Renyi’s measure, the interested reader is referred to Asadi et al. (2006).
It should be emphasized that an important feature of our proposed measures I1, I2, J and Rγ is that all these information measures are invariant under any non-singular transformation on the failure time data.
Let us denote the observed data by (Xj, δj, zj), j = 1, …, n, with Xj = min(Tj, Cj), the minimum of a failure and censoring time pair, δj = I(Tj ≤ Cj) is the indicator of the event that a death has been observed and zj = (z1j, …, zpj) is a p–vector of covariates associated with an individual j. Since evaluations of (2.9)–(2.12) require complete knowledge of β, then I1(F: F0|z), I2(F0: F|z), J(F: F0|z) and Rγ(F: F0|z) are not operational. We operationalize these measures by replacing β with , the maximum partial likelihood (MPL) estimate based on (Xj, δj, zj), j = 1, ···, n.
respectively. It should be noted that one can use (2.13)–(2.16) to perform the global hypothesis of H0: β = β0. For global testing, one can easily verify that must be replaced by – β0 in equations (2.13)–(2.16). Often, one is interested in testing a hypothesis about a subset of the β’s, i.e., local tests. The hypothesis is then H0: β1 = β10, where . Here β1 is a q × 1 vector of the β’s of interest and β2 is the vector of the remaining p – q β’s. In this situation I1, I2, J and Rγ are obtained by simply replacing β with β – β0 in equations (2.10)–(2.12), where . Consequently, Î1, Î2, Ĵ and γ can be obtained by estimating β – β0 with – 0, where and 0 are the MPL estimators of β under H1 and H0 respectively.
The following theorem gives asymptotic distributions of our proposed test statistics.
Î1; Î2; Ĵ and γ are maximum likelihood estimators and are asymptotically normal with means I1; I2; J and Rγ respectively.
As seen from (2.13)–(2.16), Î1, Î2, Ĵ and γ are simple transformations of the MPL estimator . Using the fact that is asymptotically multivariate normal with mean β and the invariance property of maximum likelihood estimators, we see that the proposed statistics are in fact maximum likelihood estimators and are asymptotically normal with the means stated in the theorem. See Klein and Moeschberger (2003) for more details and for the variance-covariance matrix of . This concludes the proof.
Using Theorem 1, we compute the exact variances of the proposed statistics Î1, Î2 and Ĵ under the null hypothesis. For simplicity, we derive the variance estimators for the case of a single covariate z. The extension to multiple covariates is straightforward. Given that is asymptotically normal with mean β = 0 and variance σ2 under the null, and using the moment generating function of the normal random variable (i.e., if ~ N(0, σ2), then for zi fixed, i = 1, …, n, and , we have,
For our measure based on Renyi’s divergence γ, the variance estimator is obtained using the delta method, simply by expanding Rγ() around β.
where . Using the first three terms in the expansion and the fact that E(2k+1) = 0 for k odd and for k even, we obtain a reasonable approximation to the variance estimator given by
In this section, we describe the implementation of our proposed test statistics for testing (2.1) against (2.2). One can also follow similar steps for global testing as well as local testing. We apply the following steps to implement Î1(F: F0|z), Î2(F0: F|z), Ĵ(F: F0|z), and γ(F: F0|z):
Step 1: Use partial likelihood procedure to estimate β (see Klein & Moeschberger (2003)). Denote the estimate by .
Step 3: From Theorem 1, it is clear that Î1, Î2, Ĵ and γ in (2.13) – (2.16) have asympotically normal distributions under H0. As shown in the previous section, the means and variances of Î1, Î2, Ĵ and γ can easily be obtained using Theorem 1. In fact, under H0, the means of all the proposed test statistics are zero.
Step 4: (a) Using Îi, reject H0 if , i = 1, 2, (b) Using J, reject H0 if , (c) Using Rγ, reject H0 if where p is the number of parameters in the model. Equivalently, we can also use the normalized z-scores and the standard normal distribution to test our hypothesis for each proposed test.
From Theorem 1, it is clear that the tests specified above using Î1, Î2, Ĵ and γ are all consistent tests.
We implemented our methods using the R statistical system (R Development Core Team (2005)). The MPLE was computed using the function coxph and the variance-covariance matrix of under H0 (Var(0)) was computed using the function vcov. All computations for the simulations, including those pertaining to the variances of the proposed statistics, were programmed in R. we utilized the parallel implementation in the snow package in R using the High Performance Computing cluster at Fox Chase Cancer Center.
In this section we illustrate our approach using two real-life examples. For Renyi’s measure, we only compute γ for γ = .5 which gives us the symmetric case simply for the purpose of illustration.
The data consist of times to remission (in weeks) of 42 leukemia patients from Kalbfleisch & Prentice (2002). This data set is available in the R statistical system (R Development Core Team (2005)). The patients are randomized into two groups; a control group and the treatment group for which the drug 6-MP was administered. Each group consists of 21 patients. The covariate is group membership specified by the indicator variable z where z = 0 denotes the control group and z = 1 denotes the treatment group.
We are interested in testing for covariate effect i.e., if there is any significant treatment effect (z = 1) relative to the control group (z = 0). Here p = 1 and the null hypothesis is
corresponding to no covariate effect against the two-sided alternative
The Cox PH model was fit to this data. The estimate of the regression coefficient is 1.57 with a standard error of 0.412 and a significant p-value of 1.4e-04. The estimate of the relative risk is e = 4.81 i.e., the risk is 4.81 times higher for a patient in the control group relative to that of a patient in the treatment group. Using the MPL estimate of β, = 1.57, our estimate of divergence given treatment group membership i.e., z = 1 is given by
Also, Ĵ(F: F0|z = 0) = 0. Consequently from (2.15), Ĵ(F:F0|z) = 63.51 and which gives a p-value < 1e-09 using an asymptotic χ2 distribution with 1 degree of freedom under the null hypothesis. Hence, we reject the null hypothesis of no covariate effect at a significance level 0.05. Similarly from (2.13), Î1(F: F0|z) = 47.14 and which gives a p-value < 1e-09, and from (2.14), Î2(F0: F|z) = 16.34 and which gives a p-value < 1e-09. Finally, using (2.16), Renyi’s divergence of order γ for γ = 0.5 is 0.5(F: F0|z) = 11.83 and which also gives a p-value < 10−9. In this example, all our proposed tests confirm that the treatment has a positive effect. In the case of Wald, LR and Score tests, the computed statistics (and p-values) were 14.5 (1.38e-04), 16.4 (5.26e-05) and 17.3 (3.28e-05), respectively. Hence, these standard tests confirm the results from our proposed statistics.
We now consider the leukemia data described in Feigl & Zelen (1965). This data set consists of uncensored survival times of 33 patients who died from acute myelogenous leukemia, and is also available in the R statistical system (R Development Core Team (2005)). The patients were also factored into 2 groups according to the presence or absence of a morphologic characteristic of white blood cells (AG). The white blood cell count at the time of diagnosis was also measured for each patient (WBC).
We are interested in testing the effects of AG and WBC on the survival time of patients. Here, AG is a binary variable (present/absent) and WBC is a continuous covariate. Also, p = 2 and the null hypothesis is
corresponding to no effects due to AG and WBC against the two-sided alternative
that at least one covariate has an effect on survival time. We fit the Cox PH model to this data with AG (z1), and log(WBC) (z2), as our covariates, and performed a global test of the null hypothesis. AG is coded as 1 for present and 0 for absent. Based on this, the MPL estimates of the regression coefficients are 1 = −1.069 and 2 = 0.368 with corresponding p-values 0.013 and 0.017, respectively. That is, AG has a negative effect and WBC has a positive effect on survival time, and both effects are statistically significant. Now, our estimate of the divergence measure given by (2.15) is Ĵ(F: F0|z) = 901190.8. Consequently, which gives a p-value of about .0031 using an asymptotic χ2 distribution with 2 degrees of freedom under the null hypothesis. Hence, we reject the null hypothesis of no covariate effect at a significance level 0.05. Similarly from (2.13), Î1(F: F0|z) = 716.38 and which gives a p-value of 0.0086. Also, Î2(F0: F|z) = 900474.4 and which gives a p-value of about .0017. Finally, using (2.16), Renyi’s divergence of order γ for γ = 0.5 is 0.5(F: F0|z) = 177.78 and which gives a p-value of .0015. It is obvious that all proposed tests confirm an effect due to the covariates. In the case of Wald, LR and Score tests, the computed statistics (and corresponding p-values) were, respectively, 15.1 (0.000537), 15.6 (0.000401) and 16.5 (0.000263). Hence, these standard tests confirm the results from our proposed statistics. It is worth noting here that if we were interested in performing a test of the local hypothesis H0: βi = 0 vs. H1: βi ≠ 0, i = 1 or 2, then one could use the approach described in Section 2 on page 6.
We compared the proposed test statistics Î1(F: F0|z), Î2(F0: F|z), Ĵ(F: F0|z) and γ(F: F0|z) for testing covariate effect in the Cox PH model with existing methods such as the Wald, Likelihood Ratio and Score tests. For Renyi’s divergence, we considered γ = 0.1, 0.25, 0.5 and 0.75 in our simulations. We used empirical estimates of power and consistency based on 5000 simulations for various sample sizes and censoring patterns. We considered the two sample problem (binary covariate case) as well the continuous covariate case.
For the two sample problem, simulations were performed using randomly generated data from exponential distributions with means 1 and 0.5. Sample sizes ranged from 5 to 20 per group (total sample size ranged from 10 to 40). Independent censoring based on a uniform distribution over a fixed interval (0,B) was applied to result in 25% censored observations. For the continuous covariate case, we randomly generated data from exponential distributions with means 0.5, 1 or 2 using the method outlined in Bender et al. (2005). For a sample of size n, we created a covariate vector by randomly permuting the vector (1/n, 2/n, …, n/n) once or generating a single sample from a standard normal or a uniform (0, 1) distribution and using this covariate vector across all simulations. Sample sizes ranged from 10 to 50 and simulations were performed in the manner outlined for the two sample problem.
For each sample size, we computed the power and consistency at various levels of the Type I error α (i.e., α = 0.01, 0.05, 0.1, 0.15, 0.2, 0.25) based on 5000 simulations for each test. The simulation results for the two sample problem are tabulated in Tables 4.1 through 4.4. For the continuous covariate case, results are tabulated in Tables 4.5 through 4.8. In each table, the first row corresponding to each test statistic indicates the power and the second row (numbers enclosed within paranthesis) indicates the consistency for each α. These results demonstrate that the proposed tests outperform all the leading tests in terms of power. In particular, we observe from the simulation results that in most cases, Î1 has better performance relative to all other proposed tests in terms of both power and consistency. This is true across sample sizes and censoring proportions. Also, as γ →1, there is an improvement in the consistency of γ. There is also an overall improvement in its power and in some cases, higher values of γ result in more power than Î1.
Overall, we find that for some loss in consistency, there is a substantial increase in power for almost all of the proposed tests relative to the three standard tests. The only exception to this is Î2. It is interesting to note that the realized α for the proposed tests stabilize at around a Type I error cut-off of 0.15. This is not observed in the case of the three standard tests. It is well known that the score test is the most powerful test, asymptotically, under the proportional hazards assumption. However, for small to moderate sample sizes, we find that the proposed tests demonstrate better performance. In particular, we recommend using our tests for small sample sizes. For larger sample sizes, we found that the performance of the proposed and standard tests are very similar (data not shown). In that case, use of either test (proposed or standard) would be appropriate. For the continuous covariate case, the overall results were similar across these three different approaches used for creating the covariate vector. This demonstrates that the proposed tests are insensitive to the choice of the covariate structure. Furthermore, for larger censoring proportions (such as 50% censored observations), we found the performance of the proposed and standard tests to be similar overall (data not shown). It should be mentioned that the critical value for the Wald statistic was based on the normalized z-score and those for all other test statistics in this study were based on a Chi-square distribution with 1 degree of freedom at a significance level of 0.05.
In this paper, we proposed several tests for covariate effect in the Cox PH model based on KL divergence and Renyi’s divergence. These tests are useful for discriminating between the baseline distribution and one possibly due to the covariates in the Cox PH model. An important advantage of these discrimination information functions is that all of them are invariant under any non-singular transformation of the failure time data.
Our proposed tests are moderately easy to compute. An attractive feature of the proposed methods is that the test statistics are simple functions of the MPLE and independent of the baseline distribution. Using the results derived in this paper, one can easily implement our tests in a readily available, open source programming environment such as the R statistical system. All these tests, particularly Î1 and γ as γ → 1, perform very well in terms of power compared with other leading tests, and in terms of both power and consistency relative to the other proposed tests. For small to moderate sample sizes, the increase in power we observe is substantial relative to the loss in consistency for all these tests compared with the three standard tests. Based on our results, the proposed tests appear to be an attractive alternative to the standard tests for small sample sizes.
While we have specifically focused on testing for any covariate effect in the widely used Cox PH model using a class of discrimination information functions, one could be interested in applying similar methods to other survival models such as the accelerated failure time (AFT) model (Etezadi-Amoli & Ciampi, 1985) or the additive hazards (AH) model (Tibshirani & Ciampi, 1983). Unlike the Cox PH model, the AFT model allows for crossing hazard curves. For a test of the null hypothesis of no covariate effect against the alternative specified by the AFT model, the discrimination information functions (analogous to the quantities specified in (2.9)–(2.12)) would be dependent on the baseline hazard. This is also the case with the AH model. In that situation, one would have to apply spline approximations for parameter estimation as described in Devarajan (1999), and then follow the methods described in this paper for testing the covariate effect in these models. Devarajan & Ebrahimi (2001) utilized such an approach for testing the goodness of fit of the Cox PH model against a generalization of the Cox PH model that allows for crossing hazards. A detailed discussion of this method is beyond the scope of this article and interested reader is referred to the paper referenced above.
The authors would like to thank the referees for their helpful comments. The work of the first author was supported in part by NIH grant P30 CA 06927 and an appropriation from the Commonwealth of Pennsylvania. The authors would like to acknowledge the services provided by the High Performance Computing Facility at Fox Chase Cancer Center.
Karthik Devarajan, Division of Population Science, Fox Chase Cancer Center, Philadelphia, PA 19111.
Nader Ebrahimi, Division of Statistics, Northern Illinois University, DeKalb, IL 60115.