Home | About | Journals | Submit | Contact Us | Français |

**|**PLoS Genet**|**v.8(11); 2012 November**|**PMC3497901

Formats

Article sections

Authors

Related links

PLoS Genet. 2012 November; 8(11): e1003096.

Published online 2012 November 8. doi: 10.1371/journal.pgen.1003096

PMCID: PMC3497901

Peter M. Visscher, Editor^{}

The University of Queensland, Australia

* E-mail: jwitte/at/ucsf.edu

The authors have declared that no competing interests exist.

This is an open-access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original author
and source are credited.

See "Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies" in volume 8, e1003032.

This article has been cited by other articles in PMC.

An important step in analyzing genetic association study data is deciding whether to adjust for covariates—those variables ancillary to the variants of interest. In particular, when testing for novel associations, should the statistical model also include known genetic or nongenetic covariates that are predictors of the trait (e.g., body mass index when studying type 2 diabetes)? Yes, if the covariates are also correlated with the primary variants but do not mediate their effects, because they may confound the genetic associations. Including them helps control bias and prevent false discoveries (Figure 1a). But the answer is less clear-cut if the covariates are not confounders.

When the trait of interest is quantitative, including a nonconfounding covariate
associated with the trait is often beneficial because it can explain some of the
variability in the outcome, thus reducing noise and increasing power to detect novel
genetic associations. On the other hand, when the trait is binary, including the
covariate can actually reduce power for case-control association studies; this is
shown in a recent paper by Piranen et al. [1] and previous work [2]–[5]. Fortunately, all
is not lost. In this issue of *PLOS Genetics*, Zaitlen et al. [6] present a new
approach that addresses this problem by leveraging information on covariates to
increase power in association studies of binary traits.

How can ignoring covariate information increase power? Assume that we are studying
the potential association between a genetic variant and a binary trait. Moreover,
assume we have measured a genetic or environmental covariate associated with the
trait but independent of the variant of interest in the source population, so it is
not a confounder (Figure 1b). If
we ascertain a random sample of study subjects, then the variant of interest and
covariate will remain independent. Here, the most powerful model for assessing
association *includes* the covariate (e.g., in a logistic regression
model) [1]. While
adding the covariate may increase the standard error of the variant association,
omitting it can bias the association towards the null hypothesis of no effect and
ultimately reduce power [1]–[5], [7].

However, most association studies do not select a random sample of study subjects,
but rather ascertain cases and controls from the source population. This
ascertainment process can create a correlation between the genetic variant and
covariate in the sample, because cases will be enriched for both risk genotypes and
high-risk covariate levels. Since these are independent in the source population,
they will remain conditionally independent among cases or controls; but the variant
and covariate will be correlated in the overall case-control sample (dashed line in
Figure 1c). In the presence
of this induced correlation, *omitting* the covariate from a logistic
regression model may be the most powerful approach. Indeed, including the covariate
could substantially increase the standard error of the genetic variant association
(i.e., due to the induced correlation), resulting in a larger power loss than might
arise from omitting the covariate and biasing the association towards the null
hypothesis.

Pirinen et al. [1] investigate this phenomenon in detail and show that the increase in power from omitting covariates is a function of disease prevalence and effect sizes. In particular, omitting a covariate can often improve power to detect genetic effects for diseases with prevalence below 2% or as high as 10% when the covariate is a particularly strong risk factor.

Improving analyses by ignoring covariates seems counterintuitive, as they should provide some information. To extract value from covariates, Zaitlen et al. [6] developed a new method that uses existing evidence of covariate associations with the trait of interest, and trait prevalence, to increase power. This approach first builds a liability model using estimates of a covariate's independent effect in the form of trait prevalences at various levels of the covariate (e.g., type 2 diabetes prevalences by age). Then it evaluates the association between the genetic variant of interest and the liability model residuals (Figure 1d). In effect, the external information about covariate effects is used to distinguish high- and low-risk cases and controls. Tests of genetic variant associations with these quantitative residuals have more power than tests of genetic associations with the original binary trait.

The value of Zaitlen et al.'s approach is demonstrated in several data sets with case-control and case-control-covariate ascertainment, where the selection probability for an individual to join the study depends on covariate levels, such as in matched studies or those with overrepresentation of low-risk cases. While covariate-based ascertainment of cases and controls can induce selection bias that must be addressed by including the covariate in a conventional regression model [8], the new method provides a potentially powerful alternative.

The authors show by application and simulation that the liability model approach increases association test statistics by 18% and 16% in comparison with logistic regression with or without covariates, respectively. Of course, this improvement hinges on having accurate external covariate information; one could envision scenarios where the external covariate data is so poor that using this approach would actually decrease power. One could also use covariate information discerned from a given dataset, but external information may be even better. A framework to propagate uncertainties through the multistage analysis of Zaitlen et al. would be useful to assess sensitivity to the quality of published or assumed trait prevalences and covariate effects, and to the estimation errors in the formation of the liability model and in the calculation of residuals. A starting point might be to repeat the analyses for a range of covariate-specific trait prevalences that bracket the actual published or assumed values.

Zaitlen and colleagues have also developed a version of the liability model approach for when the covariates are genetic markers with known trait associations [9]. Future work might compare these novel liability methods to alternative approaches for inclusion of external information, such as Bayesian models with informative priors for the covariate effects. Moreover, schemes for weighted analyses [10] suggest other ways to potentially increase association study power.

In summary, if one undertakes a case-control association study and has information on covariates that are independent risk factors for a trait—and are not confounders—simply including them in a logistic regression model is not always the optimal approach for discovering genetic variants. Instead, more power may be gained by excluding them, by using the liability model approach of Zaitlen et al. [6], [9], or by applying other novel techniques to leverage information from such covariates.

The Witte laboratory is supported by National Institutes of
Health grants R01 CA88164, U01 CA127298, and U19 GM61390. The funder had no role
in the preparation of the article.

1. Pirinen M, Donnelly P, Spencer CCA (2012) Including known covariates can reduce
power to detect genetic effects in case-control studies.
Nat Genet 44:
848–851. [PubMed]

2. Robinson LD, Jewell NP (1991) Some surprising results about
covariate adjustment in logistic regression models.
Int Stat Rev 59:
227–240.

3. Neuhaus JM, Jewell NP (1993) A geometrical approach to assess bias
due to omitted covariates in generalized linear models.
Biometrika 80:
807–815.

4. Neuhaus JM (1998) Estimation efficiency with omitted
covariates in generalized linear models. J Am Stat
Assoc 93:
1124–1129.

5. Kuo CL, Feingold E (2010) What's the best statistic for a
simple test of genetic association in a case-control study? Genet Epidemiol 34:
246–253. [PubMed]

6. Zaitlen N, Lindström S, Pasaniuc B, Cornelis M, Genovese G, et al. (2012) Informed
conditioning on clinical covariates increases power in case-control
association studies. PLoS Genet 8: e1003032.doi:10.1371/journal.pgen.1003032. [PMC free article] [PubMed]

7. Xing G, Xing C (2010) Adjusting for covariates in logistic
regression models. Genet Epidemiol 34:
769–771. [PMC free article] [PubMed]

8. Rothman KJ, Greenland S, Lash TL (2008) Modern epidemiology. 3rd edition. Philadelphia: Lippincott Williams & Wilkins. pp. 175–179.

9. Zaitlen N, Pasanuic B, Patterson N, Pollack S, Voight B, et al. (2012) Analysis of
case-control association studies with known risk variants.
Bioinformatics 28:
1729–1737. [PMC free article] [PubMed]

10. Clayton D (2012) Link functions in multi-locus genetic
models: Implications for testing, prediction, and
interpretation. Genet Epidemiol 36:
409–418. [PMC free article] [PubMed]

Articles from PLoS Genetics are provided here courtesy of **Public Library of Science**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |