Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2657816

Formats

Article sections

Authors

Related links

Genet Epidemiol. Author manuscript; available in PMC 2010 April 1.

Published in final edited form as:

Genet Epidemiol. 2009 April; 33(3): 275–280.

doi: 10.1002/gepi.20381PMCID: PMC2657816

NIHMSID: NIHMS94330

See other articles in PMC that cite the published article.

Testing Hardy-Weinberg Equilibrium (HWE) in the control group is commonly used to detect genotyping errors in genetic association studies. We propose a likelihood ratio test for testing HWE in the study population using both case and control samples. This test incorporates underlying association models. Another feature is that, when we infer the disease-genotype association, we explicitly incorporate HWE or a possible departure from Hardy-Weinberg Equilibrium (DHWE) into the model. Our unified framework enables us to infer the disease-genotype association when a detected DHWE needs to be part of the model after causes for the DHWE are explored. Real datasets are used to illustrate the application of the methodology and its implication in genetic association studies. Our analysis and interpretation touch on genotyping errors, population selection, population stratification, or the study sampling plan, all delicate issues that could be the cause of DHWE.

Hardy-Weinberg Equilibrium is used to describe the genotype distribution of a population when it is large, self-contained, and randomly mating. The equilibrium can be summarized as, if *p* is the frequency of one allele (*A*) and *q* is the frequency of the alternative allele (*a*) for a biallelic locus, then the HWE-expected frequency will be *p*^{2} for the *AA* genotype, 2*pq* for the *Aa* genotype, and *q*^{2} for the *aa* genotype. The three genotypic proportions should sum to 1, as should the allele frequencies (Hardy, 1908; Weinberg, 1908).

Many methods have been developed to test HWE. Weir (1996) and Emigh (1980) provide summaries of these methods. A *χ*^{2} test is commonly used to assess a departure from HWE. Exact tests of a departure from HWE have been developed for studies with small sample sizes (Haldane 1954; Wigginton, Cutler, and Abecasis, 2005).

Testing HWE is commonly conducted for genotyping quality control (Gomes, Collins, *et al*., 1999; Xu, Turner, *et al*., 2002). Some view the testing as an essential step in genetic association studies (Xu, Turner, *et al*., 2002; Thakkinstian, McElduff, *et al*., 2005); however, others caution such use (Nielsen, Ehm, and Weir, 1999; Wittke-Thompson, Pluzhnikov, Cox, 2005; Zou and Donner, 2006). Nielson, Ehm, *et al*. (1999) point out that HWE is generally expected to be distorted in the case sample in the region of association. Zou and Donner (2006) suggest testing for HWE should not be used as a tool for identifying genotyping errors when it is tested in a single sample. Wittke-Thompson, Pluzhnikov, Cox (2005) provide a framework to guide the interpretation of a DHWE for case-control studies. They suggest that if a DHWE in cases or in both cases and controls is detected, it does not necessarily imply genotyping errors. Rather than discarding the data, the underlying disease-genotype association should be investigated. The association may explain the observed DHWE. If not, other possible explanations such as “genotyping error, chance, failure of assumptions underlying Hardy-Weinberg expectations” should be explored. In their framework, they explicitly assume the genotype is in HWE in the population.

The current work develops a likelihood ratio test for testing HWE in the study population. Our test uses data from both case and control samples, and the procedure accounts for the underlying disease models. We estimate the parameters in the model by minimizing the deviance, thus, the estimates are maximum likelihood estimates. The difference between the deviances of two nested models follows a *χ*^{2} distribution. This forms a likelihood ratio test for the population HWE.

When we infer the disease-genotype association, HWE or a possible DHWE are explicitly incorporated into the model. A DHWE could be due to a variety of reasons such as genotyping errors, population selection, population stratification, or the sampling plan of the study. If they come into play, then it is likely these problems will impact both cases and controls. The purpose of the likelihood ratio test of the population HWE is to prompt the investigators to think about these issues in addition to possible genotyping errors. If a DHWE is detected and genotyping errors are ruled out, our unified framework enables us to model the association under DHWE in contrast to the current practice that testing HWE is a middle step in the analysis of association studies.

The rest of the manuscript is organized as follows. Section 2.1 summarizes common genetic disease models. In Section 2.2, we develop the likelihood ratio test for population HWE. In Sections 3.1 and 3.2, we demonstrate our methods in detail using data from two genetic association studies conducted at Vanderbilt University Medical Center. Then we revisit some examples discussed by Wittke-Thompson, Pluzhnikov and Cox (2005). We developed the analysis software in R and it can be obtained from the corresponding author.

We first summarize the common disease models presented in Wittke-Thompson, Pluzhnikov, Cox (2005). Then we develop a likelihood ratio test for testing HWE in the study population.

Wittke-Thompson, Pluzhnikov, Cox (2005) explicitly assume the susceptibility locus is in HWE in the study population. This assumption implies the genotype distribution follows

$$Pr\left(AA\right)={p}^{2},\phantom{\rule{1em}{0ex}}Pr\left(Aa\right)=2pq,\phantom{\rule{1em}{0ex}}Pr\left(aa\right)={q}^{2},$$

(1)

where *p* is the population frequency of the wild-type allele (*A*), and *q* is the population frequency of the disease-susceptibility allele (*a*).

Let *α* be the baseline disease penetrance in homozygotes (*AA*), and *Y* = 1, 0 denote the study outcome diseased or not. They present the following general disease model:

$$Pr(Y=1\mid AA)=\alpha ,\phantom{\rule{1em}{0ex}}Pr(Y=1\mid Aa)=\alpha \beta ,\phantom{\rule{1em}{0ex}}Pr(Y=1\mid aa)=\alpha \gamma ,$$

(2)

where *β* is the relative risk of disease for the hetrozygotes *Aa* in reference to homozygotes *AA* and *γ* is the relative risk of disease for the homozygotes *aa*.

The prevalence of disease in the population is restricted as

$${K}_{p}={p}^{2}\alpha +2pq\alpha \beta +{q}^{2}\alpha \gamma $$

(3)

It is recognized that the disease prevalence *K _{p}* can not be estimated in a case-control study, thus it has to be obtained through external studies.

In this framework, Wittke-Thompson, Pluzhnikov, Cox (2005) propose to estimate the parameters ** θ** = (

In this section, we first expand the common disease models to incorporate whether or not the genotype is in HWE in the population. Then we develop a test for the population HWE expressed as a null hypothesis that the susceptibility locus is in HWE in the population versus the alternative hypothesis that the susceptibility locus is not in HWE in the population. Here we define the study population as the population from which cases and controls are drawn and to which study findings will be generalized.

The null hypothesis *H*_{0} is expressible as (1), and under *H*_{0} the disease models are described in Section 2.1. The alternative hypothesis *H _{a}* can be expressed as the genotype distribution in the population

$$Pr\left(AA\right)={p}_{0},\phantom{\rule{1em}{0ex}}Pr\left(Aa\right)={p}_{1},\phantom{\rule{1em}{0ex}}Pr\left(aa\right)={p}_{2},$$

(4)

where *p*_{2} = 1 − *p*_{0} − *p*_{1}. Under *H _{a}*,

$${K}_{p}={p}_{0}\alpha +{p}_{1}\alpha \beta +{p}_{2}\alpha \gamma $$

(5)

Table 1 summarizes a typical data set from a traditional unmatched case-control study. The objective is to explore the relationship between disease status and a single-locus 2-allele genotype denoted as *AA, Aa*, and *aa*.

Assuming common disease model (2), under either *H*_{0} or *H _{a}*, the conditional distributions of the genotype in cases and in controls are

$$Pr(AA\mid Y=1)=\frac{Pr\left(AA\right)\alpha}{{K}_{p}},\phantom{\rule{1em}{0ex}}Pr(Aa\mid Y=1)=\frac{Pr\left(Aa\right)\alpha \beta}{{K}_{p}},\phantom{\rule{1em}{0ex}}Pr(aa\mid Y=1)=\frac{Pr\left(aa\right)\alpha \gamma}{{K}_{p}}$$

(6)

and

$$\begin{array}{cc}\hfill Pr(AA\mid Y=0)& =\frac{Pr\left(AA\right)(1-\alpha )}{(1-{K}_{p})},\hfill \\ \hfill Pr(Aa\mid Y=0)& =\frac{Pr\left(Aa\right)(1-\alpha \beta )}{(1-{K}_{p})},\hfill \\ \hfill Pr(aa\mid Y=0)& =\frac{Pr\left(aa\right)(1-\alpha \gamma )}{(1-{K}_{p})},\hfill \end{array}$$

(7)

respectively.

Conditional on the study design parameters, *n*_{.1} cases and *n*_{.2} controls, the expected numbers of genotypes in cases are:

$$\mathrm{E}{n}_{11}={n}_{.1}Pr(AA\mid Y=1),\phantom{\rule{1em}{0ex}}\mathrm{E}{n}_{21}={n}_{.1}Pr(Aa\mid Y=1),\phantom{\rule{1em}{0ex}}\mathrm{E}{n}_{31}={n}_{.1}Pr(aa\mid Y=1)$$

The expected number of genotypes in controls can be expressed as replacing *n*_{.1} with *n*_{.2} and the probability expressions with (7).

We fit models by minimizing the deviance function

$$\Lambda \left(\theta \right)=2\sum _{i,j}{n}_{ij}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\{{n}_{ij}\u2215\mathrm{E}\left({n}_{ij}\right)\}$$

(8)

over the parameter space of ** θ**, where

We use the R function nlm to obtain parameter estimates in the models. The standard error of the estimates is obtained through the inverse of the Hessian matrix and through the delta method since proper transformations of the parameters are needed to facilitate the algorithm.

Under *H*_{0} and *H _{a}*, we fit the same disease model (2), and the difference of the deviances follows a

In this section, we illustrate the application of our methodology and its implication in genetic association studies using several real datasets. The example in Section 3.1 explores a set of possible explanations for the observed DHWE. Section 3.2 discusses the potential impact of the sampling plan on the analysis. For both examples, our analysis results always prompted the investigators to reevaluate genotyping quality. After genotyping errors were ruled out, additional issues were explored as possible explanations for the DHWE in the study population. Consequently, the inference of the disease-genotype association would be made under the best-fit disease model.

We applied our methodology to an association study involving genetic variations in an accessory chloride channel subunit and hypertension in the Ghanaian population. BSND-V43I was identified as a common polymorphism in the non-Caucasian population. Functional examination of this variant demonstrated a partial loss-of-function variant when heterologously expressed with ClC-Kb in cultured cells. The BSND-V43I genotypes (*GG, AG, AA*) are (155, 27, 4) in the cases and (408, 55, 15) in the controls (Sile, Gillani, *et al*., (2007)). In current practice, where HWE is separately tested in cases and in controls, both demonstrate a significant departure from HWE (*p* = 0.043, and *p* < 0.001, respectively). Applying the general disease model, the best fit is a model with *α* = 0.2006, *β* = 0.99, *γ* = 0.85, and *q* = 0.11 using the essential hypertension prevalence of *K _{p}* = 0.20 (based on World Health Organization report (2005)). Here

The estimated *β* indicates a recessive disease model. Therefore, we separately fit a recessive model under *H*_{0} and *H _{a}*. Results are presented in Table 2. The recessive model under

There are several possible explanations for the departure from HWE in this example with the BSND-V43I allele. Some of the explanations involve genotyping errors, population stratification, sampling methods, selection, non-random mating, and possibly chance.

Historically, deviation from HWE has been attributed to genotyping errors; however the investigators had several measures in place to avoid such problems. These measures included unstructured sample-numbering system regarding cases and controls, blanks and sequence-verified controls in each plate (Sile, Gillani, *et al*., 2007).

In addition, deviation from HWE could stem from population stratification or admixture. However, previous studies by Adeyemo, Chen, *et al*., (2005) demonstrated that these issues are negligible in the Ghanaian population that Sile, Gillani, *et al*., (2007) examined.

Furthermore, sampling methods could not explain this observation of deviation from HWE. The samples were not ascertained with regards to certain clinical or genetic phenotype or prior knowledge of any disease status. The exclusion criterion was the presence of an acute illness. We believe the cases and the controls are random samples.

Finally, selection appears to be a possible explanation. Unfortunately, there is no data regarding other markers in linkage disequilibrium (LD) in this region to further examine the issue of selection. Using electrophysiology patch clamping, Sile, Gillani, *et al*., (2007) demonstrated that BSND-V43I is a partial loss-of-function polymorphism. They concluded, based on their functional data, that susceptible subjects with this allele might be protected from developing hypertension. Based on their conclusion we think that selection might be an explanation for the observed deviation from HWE. Further studies examining patterns of variations and LD are needed to determine if selection is indeed present in this region. Additional possible explanations include non-random mating and chance, for which we can not evaluate in this study.

We also applied our methodology to an association study of TGF*β*1 Codon 10 Polymorphism and Familial Pulmonary Arterial Hypertension (FPAH). Dr. John Phillips III and his group genotyped TGF*β*1 Codon 10 on a cohort of 120 FPAH patients (probands) and 51 of their relatives. Every person in the cohort has the BMPR2 mutation. The hypothesis is that codon 10 *T* to *C* transition increases expression and circulating levels of TGF*β*1, thus increasing the chance for FPAH. The TGF*β*1 codon 10 SNP genotypes (*TT, CT, CC*) are (29, 78, 13) in the cases and (17, 28, 6) in the controls (Phillips III, Poling, *et al*., 2008). When tested separately, cases demonstrated a significant departure from HWE (*p* = 0.0004), but controls did not. Fitting the general disease model, the best fit is a model with *α* = 0.0000066, *β* = 2.06, *γ* = 1.05, and *q* = 0.39 using a FPAH prevalence of *K _{p}* = 0.00001 (Online Mendelian Inheritance in Man, 2007). Here

Based on preliminary analyses conducted by Dr. John Phillips III and his group, it appears that the association between this genotype and FPAH follows a dominant disease model. We separately fit a dominant model under *H*_{0} (i.e. in the population the TGF*β*1 Codon 10 SNP genotypes are in HWE) and *H _{a}* (i.e. in the population the TGF

Among the examples discussed by Wittke-Thompson, Pluzhnikov and Cox (2005), we focus on the studies that showed significant lack-of-fit by their best-fit recessive models. The study by Ozaki, Ohnishi, *et al*. (2002) had the heterozygote relative risk estimate of 1.024, so it is also included. Under the assumption that the genotype is in HWE in the population, the lack-of-fit by the best-fit genetic disease models suggests the genotype disease association is an unlikely explanation for the observed DHWE in patients or in controls.

We first fit recessive models, i.e., with *β* fixed at 1, under the hypothesis that the genotype is in HWE for these examples. Table 4 summarizes the results. The parameter estimates and the goodness-of-fit statistics are almost identical or very close to those reported in Table 1 of Wittke-Thompson, Pluzhnikov and Cox (2005). All the best-fit recessive models show significant lack-of-fit.

Fitting recessive disease models (*β* is fixed at 1) to ten case-control studies analyzed by Wittke-Thompson, Pluzhnikov and Cox (2005) under the null hypothesis *H*_{0}: the genotype is in HWE in the study population.

We then fit the same recessive models, but under the assumption that the population is not in HWE. Results are reported in Table 5. With two parameters, *p*_{0} and *p*_{1}, to represent the genotype frequencies, the allele frequency *q* is calculated as *p*_{1}/2 + (1 − *p*_{0} − *p*_{1}). The estimates of *q* and *α* are similar to those in Table 4, and some estimates of *γ* have changed significantly. As expected, the deviances are now smaller, and all but one demonstrate significant improvement over the models reported in Table 4. For the study with PubMed ID 11889073 (see Table 5), the recessive model still shows significant lack-of-fit and the investigator needs to look for possible reasons other than the DHWE in the population for an explanation of the lack-of-fit.

Fitting recessive disease models (*β* is fixed at 1) to ten case-control studies analyzed by Wittke-Thompson, Pluzhnikov and Cox (2005) under the alternative hypothesis *H*_{a}: the genotype is not in HWE in the study population. Table 4 has more study **...**

Comparing the models under *H*_{0} and *H _{a}* using the likelihood ratio test, we see that much of the lack-of-fit shown in the recessive disease models (Table 4) can be explained by assuming the genotypes are not in HWE in the population. The likelihood ratio tests are reported in Table 6.

Likelihood ratio test of population HWE for ten case-control studies analyzed by Wittke-Thompson, Pluzhnikov and Cox (2005).

These findings raise questions for the investigators about the reason for the departure from HWE in their study population. Sections 3.1 and 3.2 present two detailed examples in which a set of possible explanations were explored to explain the departure. In addition to genotyping errors, we suggest the investigators look into similar issues for possible explanations of the DHWE. On the other hand, whether the genotype is in HWE in the study population plays an important role in making inference about the genotype disease association. Therefore, this assumption should be assessed explicitly in the model.

HWE is commonly tested separately in cases and in controls for genotyping quality control. Several researchers have expressed concern about this practice (Nielsen, Ehm, and Weir, 1999; Wittke-Thompson, Pluzhnikov, Cox, 2005; Zou and Donner, 2006). The current work proposes a likelihood ratio test for testing HWE using both case and control samples. If the problems that HWE testing is intended to detect, such as genotyping errors or population stratification, come into play, these problems are likely to impact both cases and controls. Rather than the current approach of separately testing HWE in cases and in controls as a middle step in the analysis of association studies, our methods test HWE in the study population. We explicitly incorporate HWE or a possible DHWE into the model when we infer the underlying disease-genotype association. If genotyping errors are ruled out and the DHWE is plausible, our methods provide a means to study the association. The observation that some of the estimates of *γ* in Table 5 changed significantly from the estimates in Table 4 underlines the message that the association estimates depend on the assumption.

Testing HWE in the study population also has implications beyond genotyping quality control. Some genetic methods depend on the assumption that the population is in HWE (Cheng and Chen (2005) and Cheng and Lin (2005)). Our methods provide investigators a useful tool to test HWE in the study population before they apply the analysis methods in those settings.

The examples in Section 3 illustrate the application of the methodology and its implication in genetic association studies. As demonstrated in the examples, a DHWE in the study population could be due to reasons other than genotyping errors, such as population stratification, population selection, the study sampling plan, *or failure of the assumptions underlying HWE*. Although our methods appear to carry a simple message, they touch on these delicate issues. We suggest a detected DHWE in the study population be investigated with this in mind.

One limitation of our methods is that we assume the disease prevalence is fixed. It is recognized the disease prevalence can not be estimated in a case-control study, thus it has to be obtained from external sources. In the example of Section 3.1, we also analyzed the data using essential hypertension prevalence of *K* = 0.15 and *K* = 0.25. Results were similar to the reported and therefore the conclusion is unchanged. In practice, we suggest investigators conduct sensitivity analyses using a range of plausible estimates of disease prevalence.

We acknowledge John Phillips III, *Scott Williams*, Chun Li, and Daniel Zelterman for helpful discussions. We also acknowledge John Phillips III for giving us access to the data set of TGF*β*1 codon 10 polymorphism and familial pulmonary arterial hypertension association study. This research was supported in part by the U.S. National Institutes of Health with grant RR00095 awarded to the General Clinical Research Center at Vanderbilt University Medical Center.

- Adeyemo AA, Chen G, Chen Y, Rotimi C. Genetic structure in four West African population groups. BMC Genet. 2005;6:38. [PMC free article] [PubMed]
- Agresti A. Categorical data analysis. John Wiley & Sons; New York; Chichester: 2002.
- Cheng KF, Chen JH. Bayesian models for population-based case-control studies when the population is in Hardy-Weinberg equilibrium. Genetic Epidemiology. 2005;28:183–192. [PubMed]
- Cheng KF, Lin WJ. Retrospective analysis of case-control studies when the population is in Hardy-Weinberg equilibrium. Statistics in Medicine. 2005;24:3289–3310. [PubMed]
- Emigh TH. A comparison of tests for Hardy-Weinberg equilibrium. Biometrics. 1980;36:627–642. [PubMed]
- Gomes I, Collins A, Lonjou C, Thomas NS, Wilkinson J, Watson M, Morton N. Hardy-Weinberg quality control. Annals of Human Genetics. 1999;63:535–538. [PubMed]
- Haldane JBS. An exact test for randomness of mating. Journal of Genetics. 1954;52:631–635.
- Hardy GH. Mendelian proportions in a mixed population. Science. 1908;28:49–50. [PubMed]
- Nielsen D, Ehm MG, Weir BS. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus. American Journal of Human Genetics. 1999;63:1531–1540. [PubMed]
- Online Mendelian Inheritance in Man, OMIM (TM) Johns Hopkins University; Baltimore, MD: 2007. MIM Number: 178600. URL: http://www.ncbi.nlm.nih.gov/omim/
- Ozaki K, Ohnishi Y, Iida A, Sekine A, Yamada R, Tsunoda T, Sato H, Sato H, Hori M, Nakamura Y. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nature Genetics. 2002;32:650–654. [PubMed]
- Phillips JA, III, Poling JS, Phillips CA, Stanton KC, Austin ED, Cogan JD, Wheeler L, Yu C, Newman JE, Dietz HC, Loyd JE. Synergistic heterozygosity for TGF
*β*1 SNPs and BMPR2 mutations modulates the age at diagnosis and penetrance of familial pulmonary arterial hypertension. Genet Med. 2008;10(5) (in press) [PubMed] - Sile S, Gillani NB, Velez DR, Vanoye CG, Yu C, Byrne LM, Gainer JV, Brown NJ, Williams SM, George AL., Jr. Functional BSND variants in essential hypertension. Am J Hypertens. 2007;20(11):1176–1182. [PubMed]
- Thakkinstian A, McElduff P, D'Este C, Duffy D, Attia J. A method for meta-analysis of molecular association studies. Statistics in Medicine. 2005;24:1291–1306. [PubMed]
- Weinberg W. In: On the demonstration of heredity in man, in Papers on Human Genetics. Boyer SH, editor. Prentice-Hall; Englewood Cliffs, NJ: 1908. 1963.
- Weir BS. Genetic Data Analysis II Methods for Discrete Population Genetic Data. Sinauer Associates, Inc.; Sunderland, Massachusetts: 1996.
- Wigginton JE, Cutler DJ, Abecasis GR. A note on exact tests of Hardy-Weinberg equilibrium. American Journal of Human Genetics. 2005;76:887–893. [PubMed]
- World Health Organization Regional Office for Africa Cardiovascular Disease in the African Region: Current Situation and Perspectives. AFR/RC55/12. 2005 June; 2005.
- Wittke-Thompson JK, Pluzhnikov A, Cox NJ. Rational Inference about Departures from Hardy-Weinberg Equilibrium. American Journal of Human Genetics. 2005;bf 76:967–986. [PubMed]
- Xu J, Turner A, Little J, Bleecker ER, Meyers DA. Positive results in association studies are associated with departure from Hardy-Weinberg equilibrium: hint for genotyping error? Human Genetics. 2002;111:573–574. [PubMed]
- Zou G,Y, Donner A. The merits of testing Hardy-Weinberg equilibrium in the analysis of unmatched genetic case-control data: A cautionary note. Annals of Human Genetics. 2006;70:923–933. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |