Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC2854087

Formats

Article sections

- Abstract
- Introduction
- Local Power of the Score Test
- Implications for Oracles
- Simulations
- High Correlation and Information-Carrying Covariates
- Discussion
- References

Authors

Related links

Int J Biostat. 2010 January 1; 6(1): Article 12.

Published online 2010 March 29. doi: 10.2202/1557-4679.1231

PMCID: PMC2854087

Josue G. Martinez,^{*} Raymond J Carroll,^{†} Samuel Muller,^{‡} Joshua N. Sampson,^{**} and Nilanjan Chatterjee^{††}

Copyright © 2010 The Berkeley Electronic Press. All rights reserved

This article has been cited by other articles in PMC.

We consider the problem of score testing for certain low dimensional parameters of interest in a model that could include finite but high dimensional secondary covariates and associated nuisance parameters. We investigate the possibility of the potential gain in power by reducing the dimensionality of the secondary variables via oracle estimators such as the Adaptive Lasso. As an application, we use a recently developed framework for score tests of association of a disease outcome with an exposure of interest in the presence of a possible interaction of the exposure with other co-factors of the model. We derive the local power of such tests and show that if the primary and secondary predictors are independent, then having an oracle estimator does not improve the local power of the score test. Conversely, if they are dependent, there is the potential for power gain. Simulations are used to validate the theoretical results and explore the extent of correlation needed between the primary and secondary covariates to observe an improvement of the power of the test by using the oracle estimator. Our conclusions are likely to hold more generally beyond the model of interactions considered here.

Dimension reduction in regression models via regularization has emerged as a topic that has undergone vigorous development, with oracle methods such as SCAD (Fan and Li, 2001) and the Adaptive Lasso (Zou, 2006) and other popular methods such as the Lasso (Tibshirani, 1996). We will say that an oracle method is one which is asymptotically consistent, i.e., selects the correct model with probability tending to 1.0 as the sample size *n* → ∞, and is asymptotically efficient for estimating the non-zero variables.

We are interested in testing for the effect of a low-dimensional covariate *Z* when the model contains other higher dimensional covariates (*X*, *S*) which govern nuisance parameters. As in Fan and Li (2001) and Zou (2006), we work in the context that the number of covariates is large but smaller than the sample size, i.e., the dimensionality is fixed and does not increase with the sample size. For example, in logistic regression with a binary response *Y*, the model would be

(1)

where *H*(·) is the logistic regression problem. The interest here is in testing the null hypothesis that *H*_{0} : θ = 0 and to produce an asymptotically valid significance level for that test. In model (1), (κ, η) is a nuisance parameter relating to (*X*, *S*). One cannot use an oracle method to make valid probability statements about the null hypothesis, i.e., run SCAD or the Adaptive Lasso using all of (*X*, *S*, *Z*) and then somehow test for θ. This has been emphasized in a series of papers beginning with Leeb and Pötscher (2005), who note that the limit distribution of the oracle estimate of θ is not asymptotically normal under contiguous alternatives.

Our emphasis here is very different from the usual dimension reduction framework. The parameters (κ, η) are simply nuisance parameters and in this framework are not the main focus of interest.

As a motivating example for this article, we consider an extension of the model (1) that Chatterjee et al. (2006) described for testing the association of a disease outcome with an exposure when that exposure may be interacting with other co-factors that influence the risk of the disease. See also Maity, et al. (2009) for a semiparametric extension. In Chatterjee et al., *Z* could be a specific genetic or environment exposure of interest, *X* could be an array of genetic and/or environmental covariates of higher dimension that may be potentially interacting with *Z* and *S* could be certain basic covariates, such as age and sex, that the model needs to be adjusted for. The interest focuses on whether *Z* is associated with the disease outcome *Y*. To improve power, Chatterjee et al proposed using a parsimonious, Tukey’s one-degree-of-freedom style, interaction model between *Z* and *X* of the form

(2)

where the scalar γ is meant to capture the interaction. A full description of this method is given in the Appendix. The null hypothesis of no association between *Y* and *Z* corresponds to testing *H*_{0} : θ = 0. A complication, however, is that under θ = 0, the parameter γ also disappears from the model and hence is not identifiable from the data. Nevertheless, Chatterjee et al. noticed that for each fixed value of γ, the model (2) can be used to construct a valid score-test for *H*_{0} : θ = 0. They proposed to use the maximum of such score-statistics over a range of the parameter γ as the final test statistic. They observed that the score-test has particular computational advantages, because under the null hypothesis the model (2) reduces to a standard logistic regression model involving only main effects of *X* and *S*.

Motivated by model (2) and Chatterjee, et al. (2006), we study in this paper the following algorithm. Let the loglikelihood function be denoted by
, where for example in linear regression ζ would be the regression error variance. The usual score test of course would fit the model
. An oracle would do the following, which we call an *oracle score test*.

- Under the null hypothesis
*H*_{0}: θ = 0, fit the parameters (κ, η) by using an oracle estimator that consistently chooses the correct null model. - Perform a score test using the selected components of (
*X*,*S*).

This paper addresses the simple question: does having access to an oracle under the null model improve the local power of the testing, i.e., is the oracle score test more powerful than the ordinary score test that does not do any model selection under the null? We will show that the answer to this depends on how (*X*, *S*) are correlated with *Z*: when they are independent, there is no point in having access to an oracle. We will demonstrate however that if (*X*, *S*) has information carrying elements that are highly correlated with *Z*, then having access to an oracle can be valuable.

An outline of this paper is as follows. In Section 2, we develop the general framework and state the main result on the local power of the score test. In Section 3, we discuss the implications of this result for oracle methods. In Section 4 we give simulation evidence that suggests that having access to an oracle may not be much help in a variety of practical problems. In Section 5, we briefly describe a setting that having access to an oracle will be useful, although we point out some peculiar behavior of the Adaptive Lasso implemented with 10-fold crossvalidation. All technical details are given in an Appendix.

Here we compute the local power of the score test against alternatives of the form θ = λ*n*^{−1/2}.

Suppose that the loglikelihood function is
and that the null hypothesis is *H*_{0}: θ = 0. In the general form, the function
has first derivatives
and second derivatives
. Let *β* = (κ^{T}, η^{T}, ζ^{T})^{T}. Let the Fisher information matrix for (θ, *β*) be partitioned to have diagonal elements
and
with top right hand off-diagonal element
. Under the null hypothesis, with (•) = (*Y*, *X*^{T}κ, *S*^{T}η, 0, ζ), the Fisher information matrix is

Further, define
. Let (θ* _{n}*) be the maximum likelihood estimate of

The score test statistic is
. Under the null hypothesis the score test statistic converges
, the central chisquared distribution with *p*_{θ} degrees of freedom, where *p*_{θ} is the dimension of θ. The following is a well-known but useful result.

**Result 1** *Under the local alternatives* θ = λ*n*^{−1/2},
, *the non-central chisquared distribution with p*_{θ} *degrees of freedom and noncentrality parameter* λ^{T}Ωλ/2.

We are going to show that if *Z* and (*X*, *S*) are independent, under two simple assumptions that are usually satisfied, using an oracle to reduce the dimension of the fit yields no increase in local power when compared to using all components of (*X*, *S*), and not just the ones that predict the response. This has an important implication. In perhaps the vast majority of cases, one will not expect information-carrying components (*X*, *S*) to be highly correlated with *Z*. Thus, in much of actual practice, oracle estimation when doing a score test will not improve power substantially.

We will make the following assumptions. Both Assumptions 1–2 hold for the score tests for models (1) and (2).

**Assumption 1** Under the null hypothesis, the distribution of *Y* depends on (*X*, *S*, *Z*) only through (*X*^{T}κ, *S*^{T}η).

**Assumption 2** The score test statistic is invariant to location changes in *Z*, so that for example it has the same value whether *Z* or *Z* – *E*(*Z*) is used.

**Result 2** *Suppose that Z and* (*X, S*) *are independent, and that Assumptions 1–2 hold. Then under the contiguous alternatives that* θ = θ_{n} = λ*n*^{−1/2}, *the local power of the score test depends on W* = (*X, S*) *only through* (*X*^{T}κ, *S*^{T}η).

Using Result 2 our main result follows:

**Theorem 1** *Make the same assumptions as in Result 2, including that Z is independent of (X, S). Partition β into information and non-information carrying components, so that β* = (*β*_{1}, *β*_{2} = 0). *Partition the components of W* = (*X, S*) *similarly as* (*W*_{1}, *W*_{2}). *Then, in the score test, knowing that β*_{2} = 0 *and using only W*_{1} *does not affect the local power of the score test. It follows that oracle estimators of β will have no more local power than the naive test that does not attempt dimension reduction.*

**Corollary 1** *Using Results 1–2 and Result 3 in Appendix A.1, if Z is not independent of* (*X, S*), *then in general oracle estimators will have greater power.*

It is tempting to believe that just because (*X*, *S*) are independent of *Z*, then estimates of *β* = (κ^{T}, η^{T}, ζ^{T})^{T} are independent of estimates of θ. This is not the case under alternatives in general, including in our example.

In this section, we report two simulations of the Chatterjee method of model (2). The method itself is described in Appendix A.3. In all cases, 1,000 data sets were randomly generated. In all cases, we first generated *X*_{*} to have dimension *p*_{κ} = 100, *S* to have dimension *p*_{η} = 10 and *Z*_{*} was scalar. In all cases, marginally *X _{*}* and

The sample size was *n* = 5,000. All components of κ were zero except for the first three, which had values (1.0, 1.0, 0.5). All components of η were zero except the first, which had value 0.75. The values of θ varied from the null, θ = 0, to θ = 0.07 in steps of 0.01. The Lasso using the penalized package in R, (Goeman, J. J., 2009a; Goeman, J. J., 2009b), and the Adaptive Lasso using the glmnet package in R, (Friedman J. et al. 2009), were used to estimate *β* under the null hypothesis. We also refitted *β* using only the four interesting predictors, as well as using all 110 components of (*X*, *S*).

We point out that while our emphasis is on statistical power, we also evaluate the Type I error of the tests. As the results below show, all the methods have reasonable Type I error rate control.

In the first simulation, (*X _{*}, S*

In the second simulation, we set all components of (*X _{*}, S*

Simulation of power for a 5%-level test in the case that (*X*, *S*, *Z*) are mutually correlated. Here the theory says that oracle methods should have higher local power than the method that uses all 110 predictors.

The lack of a great increase in power for the oracle estimator in this second simulation is predictable from our theory. Write *W* = (*X*^{T}, *S*^{T})^{T} and remember that *β* = (κ^{T}, η^{T})^{T}. Fix γ. Let *H*^{(1)}(*x*) = *H*(*x*){1 – *H*(*x*)}. Detailed calculations based on Theorem 1 show that the elements of the information matrix are given as

from which Ω and the noncentrality parameter can be calculated from Result 1.

In our simulation study, with γ = 2 and θ = 0.05, the noncentrality parameter when using all 110 predictors is 0.42, while it is 0.44 for the oracle. Such a modest increase in noncentrality is reflected in the very modest power gains for the oracle in the second simulation. If instead we make the common correlation in the second simulation equal to 0.90, the noncentrality parameter is 0.20 and 0.22 for the 110 variables and the oracle, respectively.

Our main purpose has been to show that there will be little to be gained by having access to an oracle if the secondary variables are not highly correlated with the variables of interest. However, as a secondary matter, it is not always the case that having access to an oracle will not help in terms of power. Using our theory, we developed cases where *X* had 15 components, only 2 of which were information-carrying. In addition, *Z* had three components, one of which was also a predictor. One of the information-carrying components of *X* was uncorrelated with *Z*, while the other had correlation > 0.85 for each component of *Z*.

We investigated two cases, one in which the information-carrying components of *X* had relative risks ≈ 4.0, and the other in which the relative risks were a more modest ≈ 0.5. Here there was a decided increase in power if an oracle was available. Disappointingly, the Adaptive Lasso with 10-fold crossvalidation was only modestly more powerful than the approach that used all covariates, and much less powerful than the oracle. The Lasso was essentially equivalent to using all the covariates. It is possible, and indeed we hope, that newer oracle methods such as the Adaptive Elastic Net (Zou and Zhang, 2009) that deal with collinearity better than the Adaptive Lasso will lead to improved performance in these cases, although currently there is no software available in R for this method. The elastic net (Zou and Hastie, 2005) is its non-oracle version, and in these simulations was essentially equivalent to the Adaptive Lasso.

We believe that using regularization methods such as the Adaptive Lasso is a natural idea in score testing for a primary variable when there are many secondary variables. Our simulations and empirical work demonstrate that if the secondary variables are not highly correlated with the variables of interest, or if these secondary variables are not informative about the response, then little will be gained by having an access to an oracle. Our theoretical numerical exercises reveal the possibility of power increases with strong correlation between highly informative secondary variables and the primary variables of interest. This situation may not commonly occur. As a point of future research, in these settings that power gains are possible, the Adaptive Lasso with 10-fold crossvalidation had fairly disappointing behavior.

It would be interesting to find empirical examples of score testing where oracle methods actually do achieve significantly greater power in practice.

Consider a general problem where the loglikelihood function is
. The null hypothesis is *H*_{0} : θ = 0, and we consider contiguous alternatives of the form θ = θ* _{n}* = λ

Subscripts will denote derivatives, e.g., and . Define and similarly for , , and . These components of the Fisher information are estimated as

The numerator of the score statistic is

(A.1)

Define and let be its estimate. The score test statistic is .

**Result 3** *Let p*_{θ} *be the dimension of* θ. *Under contiguous alternatives* θ = θ* _{n}* = λ

where
*is the noncentral chisquared distribution.*

Here is a sketch of the argument for Result 3. Under the null hypothesis, it is well known that the numerator of the score statistic has the expansion

where *β*_{0} is the true value of *β*.

To apply LeCam’s third Lemma to obtain the distribution of the numerator of the score statistic under the alternatives θ = θ* _{n}*, we note that by a Taylor series expansion, under the null hypothesis,

(A.2)

This means that under the null hypothesis,
are jointly normally distributed with means 0 and
, respectively. Their variances are
and
, respectively, and their covariance is −Ωλ. Applying LeCam’s third Lemma shows that when θ = θ* _{n}*,
. It then follows that
, as claimed.

Result 1 is a simple consequence of these results.

We merely need to show that
and that
depends on the distribution of (*X*, *S*) through (*X*^{T}κ, *S*^{T}η).

Recalling Assumption 2, we can assume that *E*(*Z*) = 0. Because of Assumption 1, under the null hypothesis,
say. Letting (•) = (*X*^{T}κ, *S*^{T}η, ζ), we can now compute the Fisher information matrix under the null hypothesis as

Since *Z* is independent of (*X*, *S*), and since *E*(*Z*) = 0, it is obvious that
, and hence that
, completing the proof.

Here are the details of the Chatterjee procedure for model (2). Let *β* = (κ^{T}, η^{T})^{T} and *W* = (*X*^{T}, *S*^{T})^{T}. The normalized score for estimating θ_{0} when evaluated at the null hypothesis θ_{0} = 0 in the logistic context is

(A.3)

The idea is that for each fixed γ, estimate *β*_{0} = (κ_{0}, η_{0}) by maximum likelihood at the null model, calling that estimate . Let *H*^{(1)}(*x*) = *H*(*x*){1 – *H*(*x*)}. Define

Further define . All these terms are estimated by replacing true parameters by their estimates under the null model and expectations by averages over the data.

Chatterjee, et al. propose as a test statistic to reject the null hypothesis for large values of

(A.4)

where they show that for each γ, under the null hypothesis
, where *p*_{θ} is the dimension of θ. They also show how to compute p-values using only simulation, as follows. Let *b* = 1,..., *B*, and for any *b*, let
be randomly generated standard normal random variables. Define

(A.5)

(A.6)

Then, asymptotically, under the null hypothesis, has the same limit distribution as does in (A.4). Hence, the p-value is just

(A.7)

^{*}Martinez was supported by a Postdoctoral Training grant from the National Cancer Institute (CA90301). Carroll’s research was supported by a grant from the National Cancer Institute (R37-CA057030. Chatterjee’s research was supported by a Gene-Environment Initiative (GEI) grant from the National Heart Lung and Blood Institute (NHLBI) and by the Intramural research program of the National Cancer Institute. This paper benefited from the constructive comments of two referees and an associate editor.

- Chatterjee N, Kalaylioglu Z, Moslehi R, Peters U, Wacholder S. Powerful multi-locus tests for genetic association in the presence of gene-gene and gene-environment interactions. American Journal of Human Genetics. 2006;79:1002–1016. doi: 10.1086/509704. [PubMed] [Cross Ref]
- Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 2001;96:1348–1360. doi: 10.1198/016214501753382273. [Cross Ref]
- Friedman J, Hastie T, Tibshirani R. glmnet: Lasso and elastic-net regularized generalized linear models. R package version 1.1-3. 2009. http://www-stat.stanford.edu/hastie/Papers/glmnet.pdf
- Goeman JJ. 2009a. Penalized R package, version 0.9–24.
- Goeman JJ. 2009b. L1 penalized estimation in the Cox proportional hazards model Biometrical Journal 51in press. [PubMed]
- Leeb H, Pötscher BM. Model selection and inference: facts and fiction. Econometric Theory. 2005;21:21–59. doi: 10.1017/S0266466605050036. [Cross Ref]
- Maity A, Carroll RJ, Mammen E, Chatterjee N. Powerful multi-locus tests for genetic association with semiparametric gene-environment interactions. Journal of the Royal Statistical Society, Series B. 2009;71:75–96. doi: 10.1111/j.1467-9868.2008.00671.x. [PMC free article] [PubMed] [Cross Ref]
- Tibshirani R. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B. 1996;58:267–288.
- Zou H. The Adaptive Lasso and its oracle properties. Journal of the American Statistical Association. 2006;101:1418–1429. doi: 10.1198/016214506000000735. [Cross Ref]
- Zou H, Hastie T. Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [Cross Ref]
- Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters. Annals of Statistics. 2009;37:1733–1751. doi: 10.1214/08-AOS625. [PMC free article] [PubMed] [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |