Search tips
Search criteria 


Logo of hheKargerHomeAlertsResources
Hum Hered. 2009 October; 69(1): 1–13.
Published online 2009 October 2. doi:  10.1159/000243149
PMCID: PMC2880731

A Unified Framework for Detecting Genetic Association with Multiple SNPs in a Candidate Gene or Region: Contrasting Genotype Scores and LD Patterns between Cases and Controls


It is critical to develop and apply powerful statistical tests for genetic association studies due to typically weak associations with complex human diseases or phenotypes. For population-based case-control studies with unphased multilocus genotype data, most of the existing methods are based on comparing genotype scores, e.g. allele frequencies, between the case and control groups. Another class of approaches are motivated to contrast linkage disequilibrium (LD) patterns between the two groups. It is expected that no single test can be uniformly most powerful across all situations, and different tests may perform better under different scenarios. A recent effort has been devoted to combining the above two classes of approaches, which however has some potential drawbacks. Here we propose a general and simple framework to unify the above two classes of approaches: it is based on the simple idea to incorporate LD measurements, in addition to genotype scores, as covariates in a logistic regression model, from which various tests can be constructed by taking advantage of the nice properties of the score statistics for the logistic model. It also has an advantage in easily accommodating covariates and other study designs. We use simulated data to show that our proposed tests performed well across several scenarios. In particular, in contrast to either of the two classes of the tests that is only powerful in detecting only one, but not both, of the two types of the distributional differences between cases and controls, our proposed tests are sensitive to both.

Key Words: Genome-wide association study, Linkage disequilibrium, Linkage disequilibrium contrast test, Logistic regression Multilocus analysis, Score test, SNP, Sum of squared score tests

1. Introduction

As genome-wide association studies have become feasible with many being underway, there have been intensive researches in developing powerful statistical tests to detect genetic association with multiple single-nucleotide polymorphisms (SNPs) in a candidate region. There are at least two compelling reasons for such developments. First, for complex human diseases or phenotypes, recent studies have confirmed usually weak genetic associations, often with estimated relative risks only from 1.1 to 1.5, hence powerful statistical tests are needed to discover genetic loci or variants weakly associated with a disease [Altshuler et al., 2008]. Second, it is known that there is no uniformly most powerful test for multiple parameters [Cox and Hinkley, 1974], as for association studies with multiple SNPs. Instead, different tests are expected to perform best under different scenarios. For example, the standard multivariate score test and a sum of squared score test can be each regarded as an estimated most powerful test [Pan, 2009].

Among the existing statistical tests for populationbased case-control studies with unphased multilocus genotype data, there are several classes of tests that aim to detect different aspects of distributional differences of genotypes between cases and controls. The first class includes some most popular approaches, such as Hotelling's T2 test [Fan and Knapp, 2003; Xiong et al., 2002] and single-locus-based tests [Roeder et al., 2005]. These tests are based on comparing genotype scores between the case and control groups. Another class includes a linkage disequilibrium (LD) contrast (LDC) test [Zaykin et al., 2006] and a modified LDC (mLDC) test [Wang et al., 2007]. The main idea of these tests is to capitalize on possible differences of LD patterns between the case and control groups [Hayes et al., 2004]. Because of their different targets, different classes of the tests have high power for different aspects of distributional differences of genotypes between the case and control groups. As expected, if the distributional difference between the two groups is only or mainly in their mean genotype scores, the first class of the tests is expected to be more powerful than the second class. Similarly, if the main difference lies only in their varying LD patterns, the second class will be more powerful, as to be confirmed by our simulation studies. For any real data, since the distributional difference could be in either or both aspects, it would be desirable to have a test that are powerful in detecting either and both types of the distributional differences. For this purpose, Wang et al. [2009] attempted to combine the ideas of the two classes of approaches and proposed a Normal-based likelihood ratio test (LRT) to compare both the mean vectors and covariance matrices of genotype scores between the two groups. However, there are several potential drawbacks associated with their method. First, a covariance matrix may not be most suitable for an LDC test as pointed out by Zaykin et al. [2006]. More importantly, it cannot incorporate a backgroundcorrected LD measurement as proposed by Wang et al. [2007], which was adopted in the mLDC test shown to perform better than the correlation-based LDC test. Second, their LRT is based on the working assumption that genotype scores have a multivariate Normal distribution, which does not hold due to the discreteness of any genotype score. Although the LRT test is still valid with this incorrect Normality assumption, the power of the test can be negatively influenced, as to be shown. Finally, due to the use of permutations and the need to calculate large covariance matrices and their determinants, the method is computationally slow.

Here we propose a general framework under logistic regression to contrast both genotype scores and LD patterns simultaneously. The basic idea is to incorporate some terms measuring LD patterns, in addition to genotype scores, as covariates into a logistic regression model. As for popular main-effects logistic regression, by taking advantage of some nice properties of the associated score statistics, we can construct a class of tests, not just a single one. Specifically, in addition to the usual multivariate score test, we can also apply a sum of squared score (SSU) test or its weighted version (SSUw), and univariate or single-locus-based minP (UminP) test. The SSU, SSUw and UminP tests have been shown to perform well across a wide range of scenarios in main-effects logistic regression models to compare genotype scores between the two groups [Chapman and Whittaker, 2008; Pan, 2009]. In this way, even only for the purpose of contrasting LD patterns between the two groups, we have an expanded set of LDC-type tests. For example, rather than taking a simple average of the terms measuring LD differences between the two groups as in the LDC and mLDC tests, we can take a weighted average by putting higher weights to those terms with smaller variances as implemented in the SSUw test, or take the most significant one as in the UminP test. Additionally and importantly, rather than depending on computationally intensive permutations as in the LDC, mLDC and LRT tests, we can use simple and accurate asymptotic distributions to calculate p values for these new tests. Furthermore, the proposed regression framework can easily accommodate covariates and other study designs.

A potential technical difficulty in incorporating LD measurements into logistic regression is the large number of regression coefficients induced by the added LD terms, leading to possibly a large number of degrees of freedom (DF) and thus possibly reduced power for a test. The SSU and SSUw tests, and to a lesser degree the UminP test, are robust to the large number of parameters to be tested. In fact, the SSU test (and thus SSUw) is closely related to an empirical Bayes test specifically developed to test parameters in a high-dimensional space [Goeman et al., 2006; Pan, 2009]. Hence, the robustness of these tests facilitates their use in a largely expanded logistic regression model including many LD-related terms. As to be shown using simulated data, if information on genetic association is mainly embedded in varying LD patterns between the case and control groups, these tests are among the most powerful. Otherwise, they may still maintain high power, and in particular are more powerful than the LDC test [Zaykin et al., 2006], mLDC test [Wang et al., 2007] and Normal-based LRT [Wang et al., 2009].

2. Methods

2.1. Data

We consider a population-based case-control study with a disease indicator as the binary trait Yi = 0 or 1, and some genotyped markers in a candidate gene or region Xi = (Xi1, …, Xik)'. We adopt the popular dosage coding for Xij : Xig,j = 0, 1 or 2 represents the copy number of one of the two alleles present at locus j for subject i, though other coding schemes can be equally adopted. Without loss of generality, for the given data {(Yi, Xi) : i = 1, …, m }, we assume that the first n0 subjects are controls with yi = 0, and the next n1 are cases with Yi = 1. The study goal is to test whether there is any association between the trait and any markers.

2.2. Statistical Tests Based on Contrasting Genotype Scores

2.2.1. Logistic Regression Model

Many tests are based on a main-effects logistic regression model:


for which we would like to test the null hypothesis H0,1 :β = (β1, β2, …, βk)' = 0 versus H1,1 :β ≠ 0.

2.2.2. Multivariate Score Test

To test H0,1, a multivariate score test can be constructed with the score statistic


with its covariance matrix consistently estimated by the expected Fisher information matrix






The test statistic is


which, under H0, has an asymptotic chi-squared distribution χ2r with degrees of freedom r = rank(V), typically r = k. A potential problem with the test is that, with many SNPs, the test can be low-powered because of the cost of the large DF.

The above score test is closely related to the generalized Hotelling's T2 test [Fan and Knapp, 2003; Xiong et al., 2002] with test statistic






are the mean genotype scores for the control and case groups respectively, and S is the pooled sample covariance matrix. On the other hand, as shown by Clayton et al. [2004], we have


Hence, TSco and TH only differ in how the covariance matrix of the mean difference between the two groups is estimated.

2.2.3. Single-Locus-Based Test

The multivariate score test involves the use of a possibly large covariance matrix, which may cause problems. A single-locusbased test would test individual SNPs one-by-one, e.g., by univariate score tests, then combine the individual tests. The most popular and simplest is the so-called univariate minP (UminP) test that takes the minimum p value of the individual tests, and then do a multiple testing adjustment. Equivalently, one can combine individual score test statistics by their maximum:


where Uj is the j-th component of U and vj = Var (Uj) is the j-th diagonal element of V.

A multiple test adjustment based on either permutation or the Bonferroni method is commonly used. Because the Bonferroni adjustment is known to be conservative, it is more common to use a permutation method by permuting Y. Here we propose using a faster simulation-based approach [Seaman and Muller-Myhsok, 2005; Chapman and Whittaker, 2008] to estimate the asymptotic null distribution. First, because the asymptotic null distribution of U is known to be U ~ N (0, V), we simulate B iid copies of U(b) 's from N (0, V) with b = 1, 2, …, B. Second, based on each U(b), we calculate T(b)UminP = max1 ≤ jk U(b)2j/vj. Third, the p value of the UminP test is simply b=1BI[TUminP<TUminp(b)]/B. We used B = 1000 throughout.

In our experience, the above simulation-based method is much faster than the permutation-based method. In addition to permuting the data, the permutation method requires refitting the model or re-calculating the test statistic repeatedly from each permuted dataset, which is typically more time-consuming than drawing a random sample from the null distribution of the test statistic. With the score test here, due to the available closed-form of the score statistic U (2) and its covariance matrix V' s invariance to permutation, the difference in computing time between the two methods is not dramatic, though in a simulation to be discussed later, the simulation method was 50% and 75% faster than the permutation method for sample sizes m = 1000 and m = 2000 respectively. If the Wald test is used, the difference will be much larger: the permutation method requires repeatedly fitting a regression model, each in an iterative algorithm to estimate β, while one only needs to fit the model once for the simulation method.

2.2.4. Sum of Squared Score Tests: SSU and SSUw

Next we describe two tests based on the multivariate score statistic. The key feature of the two tests is their ignoring the full or non-diagonal elements of the covariance matrix of the score statistic. The first one is to simply use the sum of the squared scores as the test statistic


This test is equivalent to the permutation-based version of the test of Goeman et al. [2006]. Although the Goeman test was derived under an empirical Bayes framework to test on a large number of parameters, as arising in microarray gene expression data, Chapman and Whittaker [2008] and Pan [2009] found that the Goeman test worked impressively well across a wide range of scenarios for the main-effects logistic model (1).

A weighted form of the SSU test is the SSUw test with


Hence, compared to the UminP test that takes the maximum of the individual univariate score test statistics, the SSUw takes their sum (or equivalently, average). The SSUw test can be interpreted as an estimated most powerful test [Pan, 2009], which also partially explains the good performance of SSU. Often SSU and SSUw perform similarly, but for some situations SSUw can be more or less powerful as to be shown here.

Asymptotically, each of the two test statistics is a quadratic form of Normal variates, Q = U'W−1 U, with W = I or W = Diag(V) respectively. It is well known [e.g. Johnson and Kotz, 1970, p. 150] that the distribution of Q is a weighted sum of k independent chi-squared variates with DF = 1, ∑kj = 1 cjχ21, where cj's are the eigenvalues of VW−1. Furthermore, by the results of Zhang [2005], ∑kj = 1 cjχ21 can be well approximated by aχ2d + b with


To calculate a p value, for example, for the SSU test, we use


2.3. Statistical Tests Based on Contrasting LD Patterns

Genetic associations can be revealed based on contrasting the LD patterns, e.g. measured by composite correlation matrices, between the case and control groups [Abecasis and Cookson, 2000; Hayes, 2004; Nielsen et al., 2004; Schaid, 2004; Zaykin et al., 2006]. For the dosage coding, the composite correlation between two loci can be estimated by their Pearson correlation coefficient. Suppose that the Pearson correlation matrices for the control and case groups are R0 = (r0jl) and R1 = (r1jl) respectively; for example,




and s0j are the sample mean and sample standard deviation of the genotype scores at locus j for controls, respectively. An LD contrast (LDC) test is constructed with


and its p value is obtained based on permutation by shuffling the disease labels Yi 's.

Wang et al. [2007] proposed a modified LDC (mLDC) test to better account for background LDs. The method works in the following steps. First, a linear mixed-effects model is fitted for the case and control groups separately. For example, for controls, we have


where γj is a fixed effect of locus j, bi ~ N (0, σb) is an iid random effect for subject i, and eij ~ N (0, σe) is an iid random error term. Second, based on the above fitted model, we obtain the best linear unbiased predictor (BLUP) of Xij, say Xij. Then we form the BLUP-corrected residuals, zij=Xij-Xij. Note the different use of the sample mean across all the subjects in the control or case group in the LDC test. Third, we form the cross-products of the residuals:


and their corresponding matrices Λ0-(λjl0) and Λ1=(λjl1). Finally the test statistic is


2.3. Testing on Differences of Means and of Covariance Matrices

It can be shown that the generalized Hotelling's T2 test assesses the mean difference of the genotype scores between the control and case groups, while the LDC test compares their correlation matrices. As noted by Zaykin et al. [2006], rather than the Pearson correlation matrix, the covariance matrix of each group can be also used, though the latter did not work as well as the former. Based on this observation, Wang et al. [2009] proposed a Normal-based likelihood ratio test (LRT) to compare the mean and covariance matrix differences simultaneously. Specifically, suppose that the pooled and group-specific sample means are


and the sample covariance matrices are


Then, the test statistic is


A permutation procedure is used to obtain its p value.

Wang et al. [2009] also proposed using principle component analysis (PCA) to reduce the dimension of the genotype vector Xi. Specifically, one applies PCA to the pooled sample and retains only the first k1 PCs explaining at least 85% of the total variation. Then the LRT is applied to the k1 PCs as before. The resulting test is called LRTpc.

As a result of dimensional reduction, the LRTpc test has a smaller DF and thus possibly higher power than the LRT. Further-more, for some data, the genotype scores of some linked SNPs may be linearly dependent, leading to a singular covariance matrix and consequently non-applicability of the LRT.

2.4. A Unified Framework: Logistic Regression

There are several potential drawbacks with the LRT and LRTpc tests of Wang et al. [2009]. First, the tests use sample covariance matrices, not Pearson's correlation matrices (to estimate composite correlations) as preferred by Zaykin et al. [2006]. More importantly, it is not clear how their tests can use BLUP-corrected residuals as in the mLDC test advocated by Wang et al. [2007]. As shown by [Wang et al. 2007, 2009], the mLDC test performed better than the LDC test, providing a compelling reason to use BLUP-corrected residuals, not the sample covariance matrices. Second, the LRT and LRTpc tests are based on the Normality assumption on the distribution of the genotype scores. Given each genotype score at any locus has only three possible values, its distribution cannot be Normal. Although the non-Normality does not automatically render the LRT and LRTpc tests invalid, their power can be negatively influenced by this incorrect working assumption. Third, the (asymptotic) null distributions of the two test statistics are unknown, hence computationally demanding permutations have to be used to calculate p values. Since for each permuted dataset, it is time-consuming to calculate the sample covariance matrices and their determinants, the LRT and LRTpc are much slower than any of the other tests considered in this article. Here we provide an alternative that overcomes the above three shortcomings of the LDC and LDCpc tests. The main idea is to introduce some relevant terms into a logistic regression model, and the corresponding score statistic will automatically contrast the means of these terms, including both genotype scores and LD patterns, between the case and control groups. Accordingly, not just a single test, but a class of tests as for the main-effects logistic model (1) can be constructed. First, we consider a simple and straightforward way by incorporating two-way interactions into a Full logistic regression model:

Logit Pr(Yi-1)=β0+j=1kXijβj+jlXijXilβj,l,

and test a new null hypothesis H0,2 : β1 = … = βk = β1,2 = … = βk − 1, k = 0 with any of the tests introduced in section 2.2, for all of which we can quickly calculate their p values either by their asymptotic null distributions or by simulation.

As for a main-effect shown earlier, the score statistic for an interaction, say for β1,2, is


which contrasts the mean cross-products between the two groups in the same spirit of the LDC test. Conceptually, the idea is related to using an LD measure to represent interactions between two unlinked loci [Zhao et al., 2006]. More generally, rather than using the two-way interactions, we can add various LD measurements, e.g. the crossproducts of the BLUP-corrected residuals into an LD+main-effects-logistic regression model:


and test H0,2 : β1 = … = βk = β1,2 = … = βk −1,k = 0 with any of the tests introduced in section 2.2. It is noted that the score statistic U = (U'β,(U'ββ)' contains two parts: the first part Uβ is for main effects parameters (β1, …, βk)', comparing the mean difference of the genotype scores between the two groups, and the second part Uββ for cross-product terms contrasts the LD patterns.

We can also construct tests based on contrasting LD patterns only, for example, with an LD-only logistic regression model:


and test H0,3 : β1,2 = … = βk − 1,k = 0. In particular, the test statistic TSSU for H0,3 is the same as TmLDC (up to a constant), but they differ in how to approximate their null distributions: the former is based on its asymptotic distribution while the latter is based on computationally more intensive permutations. Regardless of their different ways in estimating the null distributions, the analogy between our proposed tests based on logistic regression and the LDC/mLDC tests is the same as that between the logistic regression-based score test and the generalized Hotelling's T2 test, as discussed before.

An advantage of our proposed logistic regression framework is that it provides flexibility in incorporating various ways to combine the information in genotype scores and LD patterns. Not only various tests can be directly applied to test H0,2 in the LD+main-effects model (11), but also we can combine several tests, some of them test H0,1 in the main-effects model (1) to contrast genotype scores while others test H0,3 in the LD-only model (12) to contrast LD patterns. There is also flexibility in how to combine. Here for illustration, we will use the minP method that takes the minimum of the p values from the two SSU tests applied to models (1) and (12) respectively, then use the Bonferroni correction for multiple testing, as suggested by a reviewer. Specifically, if the p values from the SSU tests applied to the two models are PSSU,Main and PSSU,LD, then the p value of the minP test is


Note that it seems possible to apply the simulation method as discussed in section 2.2.3 to estimate the null distribution of the minP method to avoid the conservativeness of the Bonferroni correction, and it is under our current investigation.

A summary of the various methods is offered in table table1.1. For our proposed logistic regression, each of the listed tests in A) can be applied to each model listed in B).

Table 1
A summary of various methods

3. Results

3.1. Simulation I: Haplotype Effect Model

3.1.1. Simulation Set-Ups

We performed a simulation study following the setups given in Wang et al. [2009] with k = 10 marker SNPs and n1 = 500 cases and n0 = 500 controls. First, we generated a latent vector from a multivariate Normal distribution with a compound symmetry (CS) correlation structure with correlation coefficient ρ0; that is, the correlation between the two latent variables at any two loci j and l is ρil = ρ0. Second, the latent vector was dichotomized to yield a haplotype with minor allele frequencies (MAFs) at various loci randomly chosen between 0.1 and 0.4. Third, for two independently generated haplotypes h1 = s1,1s1,2s1,10 and h2 = s2,1S2,2s2,10 for any subject, if we coded st,j = 1 or 0 for the minor or major allele at any locus j, the non-null haplotype effects on a continuous liability trait y* were gt=|j=110st,j-5|/5 for t = 1 and 2; specifically, we had y* = g1 + g2 + e for a non-null model, and y* = e for a null-model, where e ~ N (0, σ) is a random error, and controlled noise level in the data. Fourth, a disease status Y was generated as the binary indicator I (y* 1 1.64). For each set-up, we simulated 1000 datasets, from which we obtained an empirical size or power for each test as its proportion of incorrectly or correctly rejecting its null hypothesis respectively. With 1000 independent replications, the Monte Carlo standard error of an empirical size or power, say pˆ,ispˆ(1-pˆ)/10000.016.

3.1.2. Simulation Results

Type I Error Rate. For the null models with various values of ρ0, all the tests except the multivariate score test based on the LD-only or LD+main-effects-logistic model maintained their Type I error rates close to the nominal significance level at 0.05 (table (table2,2, ,3),3), but the above two score tests could have much larger Type I error rates (not shown). For example, for the LD-only model, the test sizes of the score test increased from 0.067 for ρ0 = 0 to 0.107 for ρ0 = 0.8. For their inflated Type I error rates, we will not consider the two score tests in the LD-only and LD+main-effects models in the following comparisons.

Table 2
Empirical sizes and powers of various tests with nominal significance level 0.05 for simulation I
Table 3
Empirical sizes and powers of various tests with nominal significance level 0.05 for simulation I

Power. We can draw the following conclusions consistently across all scenarios with three noise levels σ. First, we consider how power changed with correlation coefficient ρ0. For the LRT, LRTpc and various LDC-type tests, the power increased with ρ0, reflecting increasing LD differences. In contrast, for other tests comparing genotype scores with or without contrasting LD patterns between the two groups, as ρ0 was increased, their power first increased, then decreased. Second, for a small ρ0 (e.g. ρ0 = 0), the SSU test (closely followed by SSUw) based on the Full-logistic model was the overall winner; for an intermediate ρ0, the SSU (or SSUw) tests based on either the Full model, Main-effects model, or LD+main-effects model performed similarly and were best, but as ρ0 was increased, the SSU test based on the LD+main-effects model was the winner; for a large ρ0, the SSU test based on the LD-only logistic model had the highest power. Third, between the LRT and LRTpc tests, the former won for a large ρ0 while the latter was the winner for a small ρ0. Fourth, as shown by Wang et al. [2007], the mLDC test consistently performed better than the LDC test, though the SSU test in the LD-only model was much better than the mLDC test. Fifth, for a smaller ρ0, the SSU test in the LD+main-effects model had an edge over the minP test, but as ρ0 increased, the power of the latter exceeded that of the former.

In summary, when LD was weak, the information about the trait-genotype association was mainly contained in genotype score differences, for which the SSU or SSUw test based on a logistic regression model with main-effects (and possibly other high-order terms) worked best. On the other hand, as the strength of LD was increased, there was a stronger difference in LD patterns, but not in genotype scores, between the two groups, for which case the SSU or SSUw test based on either LD-only or LD+main-effects logistic regression was the most powerful.

To gauge the overall performance, we averaged the power of seven representative tests over the values of ρ0 (fig. (fig.1).1). For each noise level σ, the minP test was the overall winner, followed closely by the SSU test in the LD+main-effects model, and then by the SSU test in the main-effects model.

Fig. 1
Average power in simulation I for each of methods 1–7, corresponding to the LRT, Full-SSU test, Main-SSU test, mLDC test, LD-only-SSU test, LD+Main-SSU test and minP test.

3.2. Simulation II: Two-Locus Disease Model

3.2.1. Simulation Set-Up

To mimic real LD patterns, we used haplotype data of Wang et al. [2009], containing two segments on chromosome 21 based on the CEU HapMap samples [Thorisson et al., 2005]. Wang et al. [2009] (table (table1)1) listed the haplotypes and their frequencies in the two gene regions, each with 7 SNPs. The first region included 16 different haplotypes, of which three haplotypes had a cumulative frequency of 73%. The second one had only 13 haplotypes with the two most frequent having a cumulative frequency of 69%. We combined two independently sampled haplotypes to obtain a vector of marker genotypes for each subject. As in Wang et al. [2009], we used the third SNP in each of the two regions as the causal SNPs, say u1 and u2, with MAF equal to 0.14 and 0.066 respectively, and simulated a binary trait using a dominant genetic model:


where Ii,u1 and Ii,u2 were binary indicators of the presence of the minor alleles in the two disease-causing SNPs for subject i.

For the null model, we used θ1 = θ2 = θ12 = 0. We considered four types of non-null models [Chatterjee et al., 2006]: (i) a purely epistatic model with θ1 = θ2 = 0 and θ12 = θ > 0; (ii) an additive model for odds ratio (OR) with θ = (θ1, θ2) and θ12 = θ1 + θ2 + log(eθ1 + eθ2 − 1); (iii) a multiplicative OR model (i.e. a main-effects model with only non-zero marginal effects and zero interacting effects) with θ1 = θ2 = θ > 0 and θ12 = 0; (iv) a cross-over model with θ1 = log(0.9), θ2 = 0 and θ12 = θ > 0. For all the models, we used θ0 = −log(9) ≈ −2.2. For each type of the true genetic model, we varied the value of θ.

We excluded the two causal SNPs from the observed data, and combined the remaining 12 marker SNPs as in a candidate region. We used a sample size of n1 = 500 cases and n0 = 500 controls for each simulated dataset; 1000 replicates were used for each simulation set-up, from which we obtained an empirical size or power for each test as its proportion of incorrectly or correctly rejecting its null hypothesis.

3.2.2. Simulation Results

Type I Error Rate. For the null model, each of the tests seemed to maintain a correct test size: the Type I error rates were always close to the nominal significance level at 0.05 (table (table4,4, ,55).

Table 4
Empirical sizes and powers of various tests with nominal significance level 0.05 for simulation II
Table 5
Empirical sizes and powers of various tests with nominal significance level 0.05 for simulation II

Power. First, the tests based on only contrasting LD patterns appeared to be always low powered, while those based on (only or partly) comparing mean genotype scores seemed to have higher power, suggesting that the information content for genetic associations was mainly contained in mean genotype scores. Second, for the purely epistatic model and cross-over model, the SSUw test based on the Full model was the overall winner, closely followed by the UminP test based on the Full model. On the other hand, for the other two underlying genetic models, the multivariate score test based on the main-effects model seemed to be the overall winner, followed by the SSUw and possibly SSU tests. Third, although the tests based on LD+main-effects model were not most powerful, they had much higher power than the LDC and mLDC tests, and than the LRT and LRTpc tests. Fourth, between the SSU test for the LD+main-effects model and the minP test, neither dominated the other: each performed better than the other for two genetic models.

In overall, averaged over the various parameter values in each genetic model, the power of each of seven representative tests is shown in figure figure2.2. It is clear that, due to the presence of interactions, the SSU test for the Full model was the overall winner, closely followed by the SSU for the main-effects model, the minP test or the SSU for the LD+main-effects model.

Fig. 2
Average power in simulation II for each of methods 1–7, corresponding to the LRT, Full-SSU test, Main-SSU test, mLDC test, LD-only-SSU test, LD+Main-SSU test and minP test.

3.3. Simulation III: HapMap Data for Gene CHI3L2

3.3.1. Simulation Set-Up

We conducted a simulation study based on real LD patterns within the CHI3L2 gene as observed in the HapMap data. We downloaded the SNPs of the CHI3L2 gene for the 90 CEU (Utah residents with ancestry from northern and western Europe) individuals from the HapMap site in June 2008. As in Wang and Elston [2007], first, we excluded SNPs with MAF ≤ 0.2, leaving 23 SNPs. Second, we did a single imputation for each of the missing genotypes by randomly drawing an observed genotype at the same locus. Third, we deleted redundant SNPs that were perfectly correlated with other SNPs, leading to 17 remaining SNPs. Fourth, we repeatedly sampled (with replacement) subjects from the 90 CEU individuals. As Wang and Elston, we chose the SNP rs2182114 as disease causing: if its genotype score was Xi0 for subject i, we generated a disease indicator Yi from a logistic regression model


where we chose β0 = −log 4 to give a background (i.e. not caused by the SNP) disease probability of 0.2, and the association strength between the disease and causal SNP was reected by the odds ratio (OR), which ranged from 1 (i.e. no association) to 1.5. Finally, following the case-control design, we sampled n1 = 500 cases (with Yi = 1) and n0 = 500 controls (with Yi = 0). We excluded the diseasecausing SNP when applying various statistical tests. For each set-up, we simulated 1000 datasets.

3.3.2. Simulation Results

Type I Error Rate. For the null model with OR = 1, each of the tests seemed to maintain correct test size with the Type I error rate close to the nominal significance level of 0.05 (table (table6),6), Due to the linear dependence among the SNPs, no result was available for the Normal-based LRT method.

Table 6
Empirical sizes and powers of various tests with nominal significance level 0.05 for simulation III

Power. The Full logistic regression model gave the highest power, closely followed by the Main-effects model and LD+main-effects model. Although there were no explicit interaction terms in the true model (15), due to the connection between an interaction and LD [Zhao et al., 2006], the Full model, as well as the LD+main-effects model, performed well for capturing distributional differences in both genotype scores and LD patterns between the two groups. The Main-effects model also worked well, presumably because the main distributional difference was in genotype scores and it gained power from its small DF. The LDC-type tests all had low power, though the mLDC test and the LD-only model performed better than the LDC test. Across the various values of OR, the relative performance of the methods was consistent, as shown in figure figure33.

Fig. 3
Power in simulation III for each of methods 1–7, corresponding to the LRTpc, Full-SSU test, Main-SSU test, mLDC test, LD-only-SSU test, LD+Main-SSU test and minP test.

4. Discussion

We have proposed a general framework based on logistic regression to unify two general approaches to detecting genetic association based on unphased genotype data: rather than choosing between comparing genotype scores and contrasting LD patterns between the case and control groups, as implemented in the Hotelling's T2 test and LDC test respectively, we can construct tests under this unified framework to simultaneously assess the two aspects of the distributional difference. The main idea is to incorporate some LD measurements, as those for genotype scores, as covariates in a logistic regression model. In this way, we can take advantage of the score statistic and its associated nice asymptotic properties, to construct a class of suitable tests, such as the SSU, SSUw and UminP tests, in addition to the standard multivariate score test, whose null distributions can be easily estimated based on the asymptotic Normality of the score statistic. Even if one is only interested in contrasting LD patterns between the case and control groups, logistic regression offers a general framework to construct some alternative LDC-type tests. For example, rather than taking the simple average of the LD differences as in the LDC, mLDC and our proposed SSU tests, we can take a weighted average as in the SSUw test, or take the most significant difference as in the UminP test, to summarize the distributional difference between the two groups. As shown for contrasting genotypes scores, the above alternative tests can yield higher power under some situations [Chapman and Whittaker, 2008; Pan, 2009]. Importantly, due to the nice asymptotic properties of the score vector for logistic regression (or any generalized linear model, GLM), it is easy to derive the asymptotic null distributions and thus calculate the p values for these alternative tests. The logistic regression (or more generally GLM) framework not only makes it straightforward to accommodate other phenotypes (e.g. quantitative traits) and incorporate other covariates, such as environmental or other risk factors, but also facilitates the extension of the proposed tests to other robust nonlikelihood-based es timation methods [Zhao et al., 2003] or other study designs, e.g. family-based association studies [e.g. Fan and Xiong, 2003; Baksh et al., 2005], and to combining population- and family-based association studies [Chen and Lin, 2008; Hsu et al., 2009].

Our numerical results have confirmed the good performance of the proposed tests. As expected, if the distributional difference between cases and controls lies in both genotype scores and LD patterns (while neither aspect dominates the other), the SSU test based on an LD+main-effects logistic regression model was most powerful. On the other hand, when distributional difference is mainly in only one of the two aspects, the SSU (or SSUw) test based on one of the two models, a main-effects (for genotype scores) or an LD-only (for LD patterns) logistic model, was most powerful, while that based on the other model may be quite low powered. In contrast, under these situations, although the SSU (or SSUw) test based on an LD+main-effects logistic model may not be most powerful, it still main-tains high power. In particular, in all our examples, one of the SSU and SSUw tests based on an LD+main-effects logistic regression model was always more powerful than the LDC and mLDC tests, and than two recently proposed Normal-based LRT and LRTpc tests that compare the mean and covariance differences between the two groups. Furthermore, our proposed tests are computationally much faster than the four permutation-based tests.

Although we have focused on the application of any association test to a candidate gene or region, we may extend the methodology to genome-wide association studies. One strategy is to apply some existing algorithms to detect haplotype blocks as defined as chromosome regions with SNPs in high LD, then apply an association test to each of the haplotype blocks [Cardon and Abecasis, 2003]. A potential drawback is that haplotype blocks may not be uniquely defined or detected by different methods. An alternative is to apply an association test to several fixed-sized sliding windows along a chromosome [Durrant et al., 2004], or to all possible varying-sized windows with a small preset maximum number of SNPs inside [Cheng et al., 2005]. Guo et al. [2009] showed that, though computationally quite demanding, exploring all possible sliding-window sizes could improve statistical power over that based on only haplotype blocks or single SNP loci. In any case, our proposed tests can be coupled with a haplotype block searching or sliding-window strategy for application to genome-wide association studies.


The author is grateful to the reviewers for extremely helpful and constructive comments. The author thanks Dr Tao Wang for generously sharing his R code for the mLDC test, and Dr Xuexia Wang for clarifications on the LRT/LRTpc tests. This research was partially supported by NIH grants GM081535 and HL65462.


  • Abecasis GR, Cookson WO. GOLD – graphical overview of linkage disequilibrium. Bioinformatics. 2000;16:182–183. [PubMed]
  • Altshuler D, Daly M, Lander ES. Genetic mapping in human disease. Science. 2008;322:881–888. [PMC free article] [PubMed]
  • Baksh MF, Balding DJ, Vyse TJ, Whittaker JC. A likelihood ratio approach to family-based association studies with covariates. Ann Hum Genet. 2006;70:131–139. [PubMed]
  • Cardon LR, Abecasis GR. Using haplotype blocks to map human complex trait loci. Trends Genet. 2003;19:135–140. [PubMed]
  • Chapman JM, Whittaker J: Analysis of multiple SNPs in a candidate gene or region. Genet Epidemiol, Published Online: 21 Apr 2008.
  • Chatterjee N, Kalaliouglu Z, Moslehi R, Peters U, Wacholder S. Powerful multilocus tests of genetic association in the presence of gene-gene and gene-environment interactions. Am J Hum Genet. 2006;79:1002–1016. [PubMed]
  • Chen Y-H, Lin H-W. Simple association analysis combining data from trios/sibships and unrelated controls. Genet Epidemiol. 2008;32:520–527. [PubMed]
  • Cheng R, Ma JZ, Elston RC, et al. Fine mapping functional sites or regions from case-control data using haplotypes of multiple linked SNPs. Ann Hum Genet. 2005;69:102–112. [PubMed]
  • Clayton D, Chapman J, Cooper J. Use of unphased multilocus genotype data in indirect association studies. Genet Epidemiol. 2004;27:415–428. [PubMed]
  • Durrant C, Zongdervan KT, Cardon LR, et al. Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet. 2004;75:35–43. [PubMed]
  • Fan R, Knapp M. Genome association studies of complex diseases by case-control designs. Am J Hum Genet. 2003;72:850–868. [PubMed]
  • Fan R, Xiong M. Linkage and association studies of QTL for nuclear families by mixed models. Biostatistics. 2003;4:75–95. [PubMed]
  • Goeman JJ, van de Geer S, van Houwelingen HC. Testing against a high dimensional alternative. J R Stat Soc B. 2006;68:477–493.
  • Guo Y, Li J, Bonham A, Wang Y, Deng H: Gain in power for exhaustive analyses of haplotypes using variable-sized sliding window strategy: a comparison of association mapping strategies. To appear Eur J Hum Genet. Published online, 2009. [PMC free article] [PubMed]
  • Hayes MG, del Bosque-Plata L, Tsuchiya T, Hanis CL, Cox NJ. Patterns of linkage disequilibrium in the type 2 diabetes gene calpain-10. Diabetes. 2005;54:3573–3576. [PubMed]
  • Hsu L, Starr JR, Zheng Y, Schwartz SM. On combining triads and unrelated subjects data in candidate gene studies: An application to data on testicular cancer. Hum Hered. 2009;67:88–103. [PMC free article] [PubMed]
  • Johnson NL, Kotz S. Distributions in Statistics, Continuous Univariate Distributions, vol 2. Boston: Houghton-Mifflin; 1970.
  • Lin DY. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–787. [PubMed]
  • Nielsen DM, Ehm MG, Zaykin DV, Weir BS. Effect of two- and three-locus linkage disequilibrium on the power to detect marker/ phenotype associations. Genetics. 2004;168:1029–1040. [PubMed]
  • Pan W: Asymptotic tests of association with multiple SNPs in linkage disequilibrium. To appear in Genet Epidemiol. Published Online on Jan 23, 2009. DOI: 10.1002/gepi.20402. Available at as Research Report 2008-018, Division of Biostatistics, University of Minnesota.
  • Plenge RM, et al. TRAF1-C5 as a risk locus for rheumatoid arthritis{a genomewide study. N Engl J Med. 2007;357:1199–209. [PMC free article] [PubMed]
  • Roeder K, Bacanu SA, Sonpar V, Zhang X, Devlin B. Analysis of single-locus tests to detect gene/disease associations. Genet Epidemiol. 2005;28:207–219. [PubMed]
  • Seaman SR, Muller-Myhsok B. Rapid simulation of p values for product methods and multiple testing adjustment in association studies. Am J Hum Genet. 2005;76:399–408. [PubMed]
  • Schaid DJ. Linkage disequilibrium testing when linkage phase is unknown. Genetics. 2004;166:505–512. [PubMed]
  • Wang T, Elston RC. Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet. 2007;80:353–360. [PubMed]
  • Wang T, Zhu X, Elston RC. Improving power in contrasting linkage-disequilibrium patterns between cases and controls. Am J Hum Genet. 2007;80:911–920. [PubMed]
  • Wang X, Zhang S, Sha Q: A new association test to test multiple-marker association. Genet Epidemiol 2009 (in press). [PubMed]
  • Xiong M, Zhao J, Boerwinkle E. Generalized T2 test for genome association studies. Am J Hum Genet. 2002;70:1257–1268. [PubMed]
  • Zaykin DV, Meng Z, Ehm MG. Contrasting linkage-disequilibrium patterns between cases and controls as a novel association-mapping method. Am J Hum Genet. 2006;78:737–746. [PubMed]
  • Zhang J-T. Approximate and asymptotic distributions of Chi-squared-type mixtures with applications. J Am Stat Ass. 2005;100:273–285.
  • Zhao LP, Li S, Khalid N. A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. Am J Hum Genet. 2003;72:1231–1250. [PubMed]
  • Zhao J, Jin L, Xiong M. Test for interaction between two unlinked loci. Am J Hum Genet. 2006;79:831–845. [PubMed]

Articles from Human Heredity are provided here courtesy of Karger Publishers