PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Commun Stat Theory Methods. Author manuscript; available in PMC 2010 January 5.
Published in final edited form as:
Commun Stat Theory Methods. 2009 January 1; 38(14): 2333–2347.
doi:  10.1080/03610920802536958
PMCID: PMC2802211
NIHMSID: NIHMS76096

Testing for Covariate Effect in the Cox Proportional Hazards Regression Model

Abstract

This paper presents methods for testing covariate effect in the Cox proportional hazards Model based on Kullback-Leibler divergence and Renyi’s information measure. Renyi’s measure is referred to as the information divergence of order γ (γ ≠ 1) between two distributions. In the limiting case γ → 1, Renyi’s measure becomes Kullback-Leibler divergence. In our case, the distributions correspond to the baseline and one possibly due to a covariate effect. Our proposed statistics are simple transformations of the parameter vector in the Cox proportional hazards model, and are compared with the Wald, likelihood ratio and Score tests that are widely used in practice. Finally, the methods are illustrated using two real-life data sets.

1 Introduction

The Cox proportional hazards (PH) model (Cox, 1972) offers a method for exploring the association of covariates with the survival time outcome variable often seen in medical and engineering studies. The observed data consist of independent observations on the triple (X, δ, z), where X is the minimum of a survival and censoring time pair (T, C), δ = I(TC) is the indicator of the event that a death has been observed and z = (z1, z2, …, zp)′ is a p–vector of covariates. The non-negative random variables T and C denote the survival and censoring times respectively, and are assumed to be independent conditional on the covariates.

The Cox PH model is perhaps most widely used in censored survival data analysis. For a given vector of observed covariates z, the hazard function of T is modeled as

equation M1
(1.1)

where λ0(t) is an unspecified function of time and β′ = (β1, ··· βp) is a row vector of unknown regression parameters. Alternate forms of the Cox PH model include

equation M2
(1.2)

and

equation M3
(1.3)

where S0(t) and Λ0(t) are the baseline survival and cumulative hazard functions, respectively. In (1.2) and (1.3), the survival and cumulative hazard functions of T are modeled respectively.

Checking for covariate effect in (1.1) is a matter of great interest. Several procedures have been described that include the Wald, likelihood ratio (LR) and Score tests among others (see Klein and Moeschberger (2003)). In this paper, we develop tests for covariate effect in the Cox PH model (1.1) that are easy to compute as well as perform better than these statistical tests. Our tests are based on Kullback-Leibler (KL) divergence as well as Renyi’s divergence. In Section 2, we derive expressions for our test statistics based on these measures. In Section 3, we discuss the implementation of the proposed tests and give two illustrative examples. Section 4 compares the performance of the proposed tests with existing tests in terms of power and consistency via extensive simulation studies. Section 5 provides some concluding remarks.

2 Testing for Covariate Effect in the Cox PH Model

In this case, our null hypothesis of interest is non-parametric and is given by

equation M4
(2.1)

The alternative is given by

equation M5
(2.2)

where the terms are as defined earlier. We observe that the null hypothesis corresponds to β = 0 indicating no covariate effect. We can now re-write our hypotheses as

equation M6
(2.3)

We use KL divergence (see Kullback (1978), Ebrahimi et al. (1994) and Ebrahimi and Soofi (2004) for more details) to discriminate between the null and alternative hypotheses.

In general, the KL divergence between two distribution functions F and F0 is

equation M7
(2.4)

where f and f0 are the density functions corresponding to F amd F0 respectively. I(F: F0) is also known as the directed divergence and measures the discrepancy between F and F0 in the direction of F. It is well known that I(F: F0) ≥ 0 and the equality holds if and only if F = F0. Similarly, one can also define the directed divergence

equation M8
(2.5)

which measures the discrepancy between F and F0 in the direction of F0. Combining (2.4) and (2.5), we can derive the divergence measure given by

equation M9
(2.6)

measuring the difficulty of discriminating between F0 and F.

Using (1.2), we can write the density functions under H1 and H0 as

equation M10
(2.7)

and

equation M11
(2.8)

respectively, where λF is the hazard function under H1. Using the above equations and (2.4) and (2.5), the directed divergence I1(F: F0|z) given covariate vector z, is given by

equation M12
(2.9)

and the directed divergence I2(F0: F |z) given covariate vector z, is given by

equation M13
(2.10)

The divergence measure J(F: F0|z) given the covariate vector z is given by

equation M14
(2.11)

Another information measure that one may use to discriminate between H0 and H1 is Renyi’s measure. This measure generalizes KL divergence and is referred to as the information divergence of order γ between two distributions F and F0 and corresponding densities f and f0. Conditional on the covariate vector z, this measure is defined as

equation M15
(2.12)

where γ is the order of information and γ ≠ 1. In the limit γ →1, (2.9) and (2.12) coincide. Information divergence of order γ, Rγ(F: F0|z), is symmetric only for the case equation M16, i.e., equation M17. This is better known as the Bhattacharya distance. For our application, we consider several values of γ in the interval (0, 1). For more details about Renyi’s measure, the interested reader is referred to Asadi et al. (2006).

It should be emphasized that an important feature of our proposed measures I1, I2, J and Rγ is that all these information measures are invariant under any non-singular transformation on the failure time data.

Let us denote the observed data by (Xj, δj, zj), j = 1, …, n, with Xj = min(Tj, Cj), the minimum of a failure and censoring time pair, δj = I(TjCj) is the indicator of the event that a death has been observed and zj = (z1j, …, zpj) is a p–vector of covariates associated with an individual j. Since evaluations of (2.9)–(2.12) require complete knowledge of β, then I1(F: F0|z), I2(F0: F|z), J(F: F0|z) and Rγ(F: F0|z) are not operational. We operationalize these measures by replacing β with [beta], the maximum partial likelihood (MPL) estimate based on (Xj, δj, zj), j = 1, ···, n.

Using (2.9) – (2.12), our discrimination information statistics based on KL divergence, J and Renyi’s measure are

equation M18
(2.13)

equation M19
(2.14)

equation M20
(2.15)

and

equation M21
(2.16)

respectively. It should be noted that one can use (2.13)–(2.16) to perform the global hypothesis of H0: β = β0. For global testing, one can easily verify that [beta] must be replaced by [beta]β0 in equations (2.13)(2.16). Often, one is interested in testing a hypothesis about a subset of the β’s, i.e., local tests. The hypothesis is then H0: β1 = β10, where equation M22. Here β1 is a q × 1 vector of the β’s of interest and β2 is the vector of the remaining pq β’s. In this situation I1, I2, J and Rγ are obtained by simply replacing β with ββ0 in equations (2.10)(2.12), where equation M23. Consequently, Î1, Î2, Ĵ and Rγ can be obtained by estimating ββ0 with [beta][beta]0, where [beta] and [beta]0 are the MPL estimators of β under H1 and H0 respectively.

The following theorem gives asymptotic distributions of our proposed test statistics.

Theorem 1

Î1; Î2; Ĵ and Rγ are maximum likelihood estimators and are asymptotically normal with means I1; I2; J and Rγ respectively.

Proof

As seen from (2.13)–(2.16), Î1, Î2, Ĵ and Rγ are simple transformations of the MPL estimator [beta]. Using the fact that [beta] is asymptotically multivariate normal with mean β and the invariance property of maximum likelihood estimators, we see that the proposed statistics are in fact maximum likelihood estimators and are asymptotically normal with the means stated in the theorem. See Klein and Moeschberger (2003) for more details and for the variance-covariance matrix of [beta]. This concludes the proof.

Using Theorem 1, we compute the exact variances of the proposed statistics Î1, Î2 and Ĵ under the null hypothesis. For simplicity, we derive the variance estimators for the case of a single covariate z. The extension to multiple covariates is straightforward. Given that [beta] is asymptotically normal with mean β = 0 and variance σ2 under the null, and using the moment generating function of the normal random variable (i.e., if [beta] ~ N(0, σ2), then for zi fixed, i = 1, …, n, equation M24 and equation M25, we have,

equation M26

Similarly,

equation M27

and

equation M28

For our measure based on Renyi’s divergence Rγ, the variance estimator is obtained using the delta method, simply by expanding Rγ([beta]) around β.

equation M29

where equation M30. Using the first three terms in the expansion and the fact that E([beta]2k+1) = 0 for k odd and equation M31 for k even, we obtain a reasonable approximation to the variance estimator given by

equation M32

3 Implementation of the Proposed Tests and Examples

3.1 Implementation

In this section, we describe the implementation of our proposed test statistics for testing (2.1) against (2.2). One can also follow similar steps for global testing as well as local testing. We apply the following steps to implement Î1(F: F0|z), Î2(F0: F|z), Ĵ(F: F0|z), and Rγ(F: F0|z):

Step 1: Use partial likelihood procedure to estimate β (see Klein & Moeschberger (2003)). Denote the estimate by [beta].

Step 2: Replace β by [beta] in equations (2.10)(2.12) to obtain (2.13)(2.16).

Step 3: From Theorem 1, it is clear that Î1, Î2, Ĵ and Rγ in (2.13) – (2.16) have asympotically normal distributions under H0. As shown in the previous section, the means and variances of Î1, Î2, Ĵ and Rγ can easily be obtained using Theorem 1. In fact, under H0, the means of all the proposed test statistics are zero.

Step 4: (a) Using Îi, reject H0 if equation M33, i = 1, 2, (b) Using J, reject H0 if equation M34, (c) Using Rγ, reject H0 if equation M35 where p is the number of parameters in the model. Equivalently, we can also use the normalized z-scores and the standard normal distribution to test our hypothesis for each proposed test.

Remark

From Theorem 1, it is clear that the tests specified above using Î1, Î2, Ĵ and Rγ are all consistent tests.

We implemented our methods using the R statistical system (R Development Core Team (2005)). The MPLE [beta] was computed using the function coxph and the variance-covariance matrix of [beta] under H0 (Var([beta]0)) was computed using the function vcov. All computations for the simulations, including those pertaining to the variances of the proposed statistics, were programmed in R. we utilized the parallel implementation in the snow package in R using the High Performance Computing cluster at Fox Chase Cancer Center.

3.2 Examples

In this section we illustrate our approach using two real-life examples. For Renyi’s measure, we only compute Rγ for γ = .5 which gives us the symmetric case simply for the purpose of illustration.

Example 1

The data consist of times to remission (in weeks) of 42 leukemia patients from Kalbfleisch & Prentice (2002). This data set is available in the R statistical system (R Development Core Team (2005)). The patients are randomized into two groups; a control group and the treatment group for which the drug 6-MP was administered. Each group consists of 21 patients. The covariate is group membership specified by the indicator variable z where z = 0 denotes the control group and z = 1 denotes the treatment group.

We are interested in testing for covariate effect i.e., if there is any significant treatment effect (z = 1) relative to the control group (z = 0). Here p = 1 and the null hypothesis is

equation M36

corresponding to no covariate effect against the two-sided alternative

equation M37

The Cox PH model was fit to this data. The estimate of the regression coefficient [beta] is 1.57 with a standard error of 0.412 and a significant p-value of 1.4e-04. The estimate of the relative risk is e[beta] = 4.81 i.e., the risk is 4.81 times higher for a patient in the control group relative to that of a patient in the treatment group. Using the MPL estimate of β, [beta] = 1.57, our estimate of divergence given treatment group membership i.e., z = 1 is given by

equation M38

Also, Ĵ(F: F0|z = 0) = 0. Consequently from (2.15), Ĵ(F:F0|z) = 63.51 and equation M39 which gives a p-value < 1e-09 using an asymptotic χ2 distribution with 1 degree of freedom under the null hypothesis. Hence, we reject the null hypothesis of no covariate effect at a significance level 0.05. Similarly from (2.13), Î1(F: F0|z) = 47.14 and equation M40 which gives a p-value < 1e-09, and from (2.14), Î2(F0: F|z) = 16.34 and equation M41 which gives a p-value < 1e-09. Finally, using (2.16), Renyi’s divergence of order γ for γ = 0.5 is R0.5(F: F0|z) = 11.83 and equation M42 which also gives a p-value < 10−9. In this example, all our proposed tests confirm that the treatment has a positive effect. In the case of Wald, LR and Score tests, the computed statistics (and p-values) were 14.5 (1.38e-04), 16.4 (5.26e-05) and 17.3 (3.28e-05), respectively. Hence, these standard tests confirm the results from our proposed statistics.

Example 2

We now consider the leukemia data described in Feigl & Zelen (1965). This data set consists of uncensored survival times of 33 patients who died from acute myelogenous leukemia, and is also available in the R statistical system (R Development Core Team (2005)). The patients were also factored into 2 groups according to the presence or absence of a morphologic characteristic of white blood cells (AG). The white blood cell count at the time of diagnosis was also measured for each patient (WBC).

We are interested in testing the effects of AG and WBC on the survival time of patients. Here, AG is a binary variable (present/absent) and WBC is a continuous covariate. Also, p = 2 and the null hypothesis is

equation M43

corresponding to no effects due to AG and WBC against the two-sided alternative

equation M44

that at least one covariate has an effect on survival time. We fit the Cox PH model to this data with AG (z1), and log(WBC) (z2), as our covariates, and performed a global test of the null hypothesis. AG is coded as 1 for present and 0 for absent. Based on this, the MPL estimates of the regression coefficients are [beta]1 = −1.069 and [beta]2 = 0.368 with corresponding p-values 0.013 and 0.017, respectively. That is, AG has a negative effect and WBC has a positive effect on survival time, and both effects are statistically significant. Now, our estimate of the divergence measure given by (2.15) is Ĵ(F: F0|z) = 901190.8. Consequently, equation M45 which gives a p-value of about .0031 using an asymptotic χ2 distribution with 2 degrees of freedom under the null hypothesis. Hence, we reject the null hypothesis of no covariate effect at a significance level 0.05. Similarly from (2.13), Î1(F: F0|z) = 716.38 and equation M46 which gives a p-value of 0.0086. Also, Î2(F0: F|z) = 900474.4 and equation M47 which gives a p-value of about .0017. Finally, using (2.16), Renyi’s divergence of order γ for γ = 0.5 is R0.5(F: F0|z) = 177.78 and equation M48 which gives a p-value of .0015. It is obvious that all proposed tests confirm an effect due to the covariates. In the case of Wald, LR and Score tests, the computed statistics (and corresponding p-values) were, respectively, 15.1 (0.000537), 15.6 (0.000401) and 16.5 (0.000263). Hence, these standard tests confirm the results from our proposed statistics. It is worth noting here that if we were interested in performing a test of the local hypothesis H0: βi = 0 vs. H1: βi ≠ 0, i = 1 or 2, then one could use the approach described in Section 2 on page 6.

4 Comparison of the Proposed Test with Existing Methods

We compared the proposed test statistics Î1(F: F0|z), Î2(F0: F|z), Ĵ(F: F0|z) and Rγ(F: F0|z) for testing covariate effect in the Cox PH model with existing methods such as the Wald, Likelihood Ratio and Score tests. For Renyi’s divergence, we considered γ = 0.1, 0.25, 0.5 and 0.75 in our simulations. We used empirical estimates of power and consistency based on 5000 simulations for various sample sizes and censoring patterns. We considered the two sample problem (binary covariate case) as well the continuous covariate case.

For the two sample problem, simulations were performed using randomly generated data from exponential distributions with means 1 and 0.5. Sample sizes ranged from 5 to 20 per group (total sample size ranged from 10 to 40). Independent censoring based on a uniform distribution over a fixed interval (0,B) was applied to result in 25% censored observations. For the continuous covariate case, we randomly generated data from exponential distributions with means 0.5, 1 or 2 using the method outlined in Bender et al. (2005). For a sample of size n, we created a covariate vector by randomly permuting the vector (1/n, 2/n, …, n/n) once or generating a single sample from a standard normal or a uniform (0, 1) distribution and using this covariate vector across all simulations. Sample sizes ranged from 10 to 50 and simulations were performed in the manner outlined for the two sample problem.

For each sample size, we computed the power and consistency at various levels of the Type I error α (i.e., α = 0.01, 0.05, 0.1, 0.15, 0.2, 0.25) based on 5000 simulations for each test. The simulation results for the two sample problem are tabulated in Tables 4.1 through 4.4. For the continuous covariate case, results are tabulated in Tables 4.5 through 4.8. In each table, the first row corresponding to each test statistic indicates the power and the second row (numbers enclosed within paranthesis) indicates the consistency for each α. These results demonstrate that the proposed tests outperform all the leading tests in terms of power. In particular, we observe from the simulation results that in most cases, Î1 has better performance relative to all other proposed tests in terms of both power and consistency. This is true across sample sizes and censoring proportions. Also, as γ →1, there is an improvement in the consistency of Rγ. There is also an overall improvement in its power and in some cases, higher values of γ result in more power than Î1.

Table 4.1
Power Comparison and Consistency of Proposed Tests: Binary Covariate (25% censoring)
Table 4.4
Power Comparison and Consistency of Proposed Tests: Binary Covariate (25% censoring)
Table 4.5
Power Comparison and Consistency of Proposed Tests: Continuous Covariate (25% censoring)
Table 4.8
Power Comparison and Consistency of Proposed Tests: Continuous Covariate (25% censoring)

Overall, we find that for some loss in consistency, there is a substantial increase in power for almost all of the proposed tests relative to the three standard tests. The only exception to this is Î2. It is interesting to note that the realized α for the proposed tests stabilize at around a Type I error cut-off of 0.15. This is not observed in the case of the three standard tests. It is well known that the score test is the most powerful test, asymptotically, under the proportional hazards assumption. However, for small to moderate sample sizes, we find that the proposed tests demonstrate better performance. In particular, we recommend using our tests for small sample sizes. For larger sample sizes, we found that the performance of the proposed and standard tests are very similar (data not shown). In that case, use of either test (proposed or standard) would be appropriate. For the continuous covariate case, the overall results were similar across these three different approaches used for creating the covariate vector. This demonstrates that the proposed tests are insensitive to the choice of the covariate structure. Furthermore, for larger censoring proportions (such as 50% censored observations), we found the performance of the proposed and standard tests to be similar overall (data not shown). It should be mentioned that the critical value for the Wald statistic was based on the normalized z-score and those for all other test statistics in this study were based on a Chi-square distribution with 1 degree of freedom at a significance level of 0.05.

5 Concluding Remarks

In this paper, we proposed several tests for covariate effect in the Cox PH model based on KL divergence and Renyi’s divergence. These tests are useful for discriminating between the baseline distribution and one possibly due to the covariates in the Cox PH model. An important advantage of these discrimination information functions is that all of them are invariant under any non-singular transformation of the failure time data.

Our proposed tests are moderately easy to compute. An attractive feature of the proposed methods is that the test statistics are simple functions of the MPLE [beta] and independent of the baseline distribution. Using the results derived in this paper, one can easily implement our tests in a readily available, open source programming environment such as the R statistical system. All these tests, particularly Î1 and Rγ as γ → 1, perform very well in terms of power compared with other leading tests, and in terms of both power and consistency relative to the other proposed tests. For small to moderate sample sizes, the increase in power we observe is substantial relative to the loss in consistency for all these tests compared with the three standard tests. Based on our results, the proposed tests appear to be an attractive alternative to the standard tests for small sample sizes.

While we have specifically focused on testing for any covariate effect in the widely used Cox PH model using a class of discrimination information functions, one could be interested in applying similar methods to other survival models such as the accelerated failure time (AFT) model (Etezadi-Amoli & Ciampi, 1985) or the additive hazards (AH) model (Tibshirani & Ciampi, 1983). Unlike the Cox PH model, the AFT model allows for crossing hazard curves. For a test of the null hypothesis of no covariate effect against the alternative specified by the AFT model, the discrimination information functions (analogous to the quantities specified in (2.9)–(2.12)) would be dependent on the baseline hazard. This is also the case with the AH model. In that situation, one would have to apply spline approximations for parameter estimation as described in Devarajan (1999), and then follow the methods described in this paper for testing the covariate effect in these models. Devarajan & Ebrahimi (2001) utilized such an approach for testing the goodness of fit of the Cox PH model against a generalization of the Cox PH model that allows for crossing hazards. A detailed discussion of this method is beyond the scope of this article and interested reader is referred to the paper referenced above.

Table 4.2
Power Comparison and Consistency of Proposed Tests: Binary Covariate (25% censoring)
Table 4.3
Power Comparison and Consistency of Proposed Tests: Binary Covariate (25% censoring)
Table 4.6
Power Comparison and Consistency of Proposed Tests: Continuous Covariate (25% censoring)
Table 4.7
Power Comparison and Consistency of Proposed Tests: Continuous Covariate (25% censoring)

Acknowledgments

The authors would like to thank the referees for their helpful comments. The work of the first author was supported in part by NIH grant P30 CA 06927 and an appropriation from the Commonwealth of Pennsylvania. The authors would like to acknowledge the services provided by the High Performance Computing Facility at Fox Chase Cancer Center.

Contributor Information

Karthik Devarajan, Division of Population Science, Fox Chase Cancer Center, Philadelphia, PA 19111.

Nader Ebrahimi, Division of Statistics, Northern Illinois University, DeKalb, IL 60115.

References

  • Asadi M, Ebrahimi N, Soofi E. Dynamic generalized information measures. Statistics and Probability letters. 2006;71:85–98.
  • Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005;24:1713–1723. [PubMed]
  • Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society-Series B. 1972;34:187–202.
  • Devarajan K. Ph.D. Dissertation. Division of Statistics, Northern Illinois University; DeKalb, Illinois: 1999. Inference for a non-proportional hazards regression model and applications.
  • Devarajan K, Ebrahimi N. Goodness-of-Fit Tests and Model Validity, Statistics for Industry and Technology. Birkhauser Publishers; Boston: 2001. Goodness of fit testing for the Cox proportional hazards model using Kullback-Leibler discrimination function; pp. 235–251.
  • Ebrahimi N, Habibullah M, Soofi E. Testing exponentiality based on Kullback-Leibler information. Journal of the Royal Statistical Society-Series B. 1994;54:739–748.
  • Ebrahimi N, Soofi E. Information functions for reliability. In: Soyer R, Mazzuchi TA, Singpurwalla N, editors. Mathematical Reliability: An Expository Perspective. Kluwer Academic; Boston: 2004.
  • Etezadi-Amoli J, Ciampi A. A General Model for Testing the Proportional Hazards and the Accelerated Failure Time Hypothesis in the Analysis of Censored Survival Data with Covariates. Communications in Statistics-A. 1985;14:651–667.
  • Feigl P, Zelen M. Estimation of exponential survival probabilities with con-comitant information. Biometrics. 1965;21:826–838. [PubMed]
  • Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2. John Wiley; New York: 2002.
  • Klein John, Moeschberger ML. Survival analysis, techniques for censored and truncated data. 2. Springer; New York: 2003.
  • Kullback S. Information Theory and Statistics. Dover Publications Inc.; New York: 1978.
  • R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria; 2005. URL http://www.R-project.org.
  • Tibshirani RJ, Ciampi A. A family of proportional- and additive-hazards models for survival data. Biometrics. 1983;39:141–147. [PubMed]