Home | About | Journals | Submit | Contact Us | Français |

**|**Europe PMC Author Manuscripts**|**PMC2691454

Formats

Article sections

Authors

Related links

Genet Epidemiol. Author manuscript; available in PMC 2009 June 4.

Published in final edited form as:

Genet Epidemiol. 2008 September; 32(6): 560–566.

doi: 10.1002/gepi.20330PMCID: PMC2691454

EMSID: UKMS4889

London School of Hygiene and Tropical Medicine, London, UK

Corresponding Author: Juliet Chapman, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK, Tel: 0207 6127841.

See other articles in PMC that cite the published article.

We consider the analysis of multiple SNPs within a gene or region. The simplest analysis of such data is based on a series of single SNP hypothesis tests, followed by correction for multiple testing, but it is intuitively plausible that a joint analysis of the SNPs will have higher power, particularly when the causal locus may not have been observed. However, standard tests, such as a likelihood ratio test based on an unrestricted alternative hypothesis, tend to have large numbers of degrees of freedom and hence low power. This has motivated a number of alternative test statistics. Here we compare several of the competing methods, including the multivariate score test (Hotelling’s test) of Chapman et al. [2003], Fisher’s method for combining p-values, the minimum p-value approach, a Fourier transform based approach recently suggested by Wang and Elston [2007] and a Bayesian score statistic proposed for microarray data by Goeman et al. [2005]. Some relationships between these methods are pointed out, and simulation results given to show that the minimum p-value and the Goeman et al. [2005] approaches work well over a range of scenarios. The Wang and Elston approach often performs poorly; we explain why, and show how its performance can be substantially improved.

Current genetic association studies produce information on many SNPs in a gene, region or, in more recent studies, over the entire genome, but it is not obvious how this information should be best analysed to detect and locate causal variants affecting the trait of interest. The simplest approach is to perform a separate test at each genotyped SNP: this raises multiple testing problems, but is otherwise straightforward, and is optimal where we are searching for a single causal variant that we believe has been included in the set of genotyped SNPs. However, we will often believe that multiple causal variants exist and, more importantly, current genotyping densities make it very unlikely that the causal variant has been genotyped: instead, we rely on the causal variant being highly correlated with one or more of the genotyped SNPs. This suggests that a joint analysis of SNPs in a gene or region may be preferable, since combining information across many SNPs may better predict the unobserved causal variant than any single SNP. A very large number of such methods have been suggested, ranging from sophisticated but computationally expensive coalescent-based probability models [Morris et al., 2002; Tachmazidou et al., 2007] to more ad hoc schemes for combining single SNP *p*-values, the best known being Fisher’s method.

Here we restrict our attention to a subset of approaches that generate test statistics based directly on genotype data in a candidate gene: such approaches are straightforward, and there is evidence that they are at least as powerful as alternatives, for instance haplotype based tests [Chapman et al., 2003]. Such tests can be applied to larger regions, or even to genome wide studies, by dividing the region in to a number of possibly overlapping windows and then testing the SNPs within each window. It is currently an open, and important, question as to whether this will outperform the single SNP tests currently favoured for genome wide studies [The Wellcome Trust Case-Control Consortium, 2007].

The approaches we consider can be divided into two types. The first performs a set of univariate tests and then attempts to combine these tests in some way: we consider the best known such approach, Fisher’s method for combining p-values, together with the standard approach of using the smallest *p* value observed in a region as a measure of significance within that region. The second are true multivariate tests. These are motivated by the observation that increasing the number of SNPs in a set will increase our ability to predict an unobserved causal variant, therefore suggesting an increase in power. However as the number of SNPs increases so too does the degrees of freedom (df) “spent” and for standard tests—for example a likelihood ratio test based on a regression model fitting a parameter per SNP—-this can result in a net loss of power. Accordingly, several authors have developed test statistics which use some sort of dimension reduction to reduce the number of degrees of freedom spent by a test for a given number of SNPs. Within this class, we examine a recent approach due to Wang and Elston [2007], who suggested a test statistic that reduces the effective degrees of freedom by using a weighted Fourier transform test that puts higher weight upon the low-frequency components (*i.e.* those explaining the most information). This method is related to a number of simple alternatives; we discuss these and note that these will often be preferable to the original Wang and Elston [2007] test. We also investigate an approach based on work by Goeman et al. [2005] on the analysis of high dimensional gene expression data, and not previously applied to association studies. Goeman et al define a general score test within an empirical Bayesian model, which assumes a central normal prior for the appropriate regression parameters, and suggest the variance of this prior to be a multiple of the corresponding identity matrix. Within a linear model framework Goeman et al. show that their test statistic effectively forces the test to put more weight upon the higher Eigen vectors of the matrix of genotype data, rather than the most informative Fourier transform components, as in Wang’s test statistic. We compare the power of these approaches with the usual multivariate score test (Hotelling’s *T*^{2} test) [Chapman et al., 2003; Xiong et al., 2002; Fan and Knapp, 2003].

We now give a more detailed description of the test statistics investigated, before comparing the methods by simulations based on real sequence data from two genomic regions. We will use two scenarios; firstly the case in which we have a set of tag SNPs representing the region and secondly the situation in which we have full sequencing data across the region. Throughout, we assume a case-control study design.

We begin by establishing some notation, and then give the test statistics to be considered. We assume that we have genotyped *N* individuals at *n* loci. We let *X _{i}* = (

The standard score test for association between locus *k* and case control status is

$${t}_{k}^{2}=\frac{{u}_{k}^{2}}{{v}_{k}},$$

where

$${u}_{k}=\sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y}){X}_{ik},$$

is the mean of * and *v _{k}* is the variance of

$${v}_{k}=\frac{1}{N}\sum _{i=1}^{N}{({Y}_{i}-\stackrel{\u2012}{Y})}^{2}\sum _{i=1}^{N}({X}_{i}-\stackrel{\u2012}{X}){({X}_{i}-\stackrel{\u2012}{X})}^{T}.$$

Since we have *n* test statistics we need to correct for multiple testing [Schaffer, 1995] by calculating the family wise error rate, that is the probability of at least one false discovery under the global null that no loci are associated to disease. This is equivalent to combining all single locus tests into a single test statistic through their maximum value (minimum p-value), giving

$${T}_{\mathit{minp}}=\underset{k=1}{\overset{n}{\mathrm{max}}}{t}_{k}^{2}.$$

In general this has an unknown null distribution and we therefore calculate the appropriate p-value using a standard permutation argument, in which we randomly permute the disease status across all individuals.

Fisher [1932] suggested combining information across multiple tests using the statistic

$${T}_{\text{fisher}}=-2\ast \sum _{j=1}^{n}\mathrm{log}\left({p}_{j}\right),$$

where *p _{j}* is the p-value corresponding to the single locus test at locus

Chapman et al. [2003] show that when assuming a single underlying causal locus, and a genotype based model, the appropriate multivariate score test statistic can be defined by

$${T}_{\text{usual}}={U}^{T}{V}^{-1}U,$$

where

$$\begin{array}{cc}\hfill U=& \sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y}){X}_{i}\hfill \\ \hfill =& {X}^{T}(Y-\stackrel{\u2012}{Y})\hfill \end{array}$$

is a vector of length *n* and *V* is the estimated null variance-covariance matrix of *U*,

$$\begin{array}{cc}\hfill V=& (N-1)\ast \sum _{i=1}^{n}{({Y}_{i}-\stackrel{\u2012}{Y})}^{2}\ast \sum _{i=1}^{n}({X}_{i}-\stackrel{\u2012}{X}){({X}_{i}-\stackrel{\u2012}{X})}^{T}\hfill \\ \hfill =& (N-1){(Y-\stackrel{\u2012}{Y})}^{T}(Y-\stackrel{\u2012}{Y})\ast {(X-\stackrel{\u2012}{X})}^{T}(X-\stackrel{\u2012}{X}).\hfill \end{array}$$

Under the null that there is no associated locus within the region, this usual test statistic has an asymptotic chi-square distribution on *n* degrees of freedom. This test is equivalent to the score test for a logistic regression model in which each *x _{ij}* is included as an explanatory variable of

The test statistic of Goeman et al. [2005] is closely related to *T _{usual}* above, being based on the logistic regression model in which each

More precisely, the test statistic of Goeman et al. is based upon the likelihood, *f* (*β*;*D*), where *D* represents the data and *β* the parameters of the model, and assumes a prior for *β* = *τ***b** such that E(**b**) = 0 and E(**bb**^{T}) = Σ. Goeman et al. [2005] show that the empirical Bayesian score test then has the form

$${T}_{\text{goeman}}=\frac{1}{2}({U}_{f}^{T}\Sigma {U}_{f}-\text{trace}\left(\Sigma {I}_{f}\right)),$$

where *U _{f}* is the score function relating to likelihood

$$\begin{array}{cc}\hfill {I}_{f}=& \stackrel{\u2012}{Y}(1-\stackrel{\u2012}{Y})\ast \sum _{i=1}^{n}({X}_{i}-\stackrel{\u2012}{X}){({X}_{i}-\stackrel{\u2012}{X})}^{T}\hfill \\ \hfill =& \stackrel{\u2012}{Y}(1-\stackrel{\u2012}{Y})\ast {(X-\stackrel{\u2012}{X})}^{T}(X-\stackrel{\u2012}{X}),\hfill \end{array}$$

leading to a test statistic of the form

$${T}_{\text{goeman}}=\frac{1}{2}({(Y-\stackrel{\u2012}{Y})}^{T}X{X}^{T}(Y-\stackrel{\u2012}{Y})-\stackrel{\u2012}{Y}(1-\stackrel{\u2012}{Y})\ast \text{trace}\left({(X-\stackrel{\u2012}{X})}^{T}(X-\stackrel{\u2012}{X})\right)).$$

Intuitively, this produces a test which ignores correlation between SNPs, relative to tests not assuming independent SNP effects; this increases power for certain alternatives and reduces it for others. In particular, this test will have increased power when positively correlated SNPs have effects in the same direction, since the deviations from expectations under the null at each SNP are combined without adjustment for the positive correlation between the SNPs, but reduced power when positively correlated SNPs act in opposite directions or negatively correlated SNPs in the same direction. Since the first situation is believed to be common in genetic association studies, arising for instance when several observed SNPs are in Linkage Disequilibrium (LD) with an unobserved causal variant, whilst the second would require two nearby loci to act in opposite directions or particular epistatic patterns, we might expect a net gain in power.

This test statistic has no known distribution and we can use the same permutation argument to estimate appropriate critical values under the global null hypothesis. Note, however, that under permutation only the first part of this test statistic, equating to *U ^{T}U*, is random and that under the null the

Wang and Elston [2007] suggest an alternative multivariate test that aims to reduce the test degrees of freedom by using a weighted Fourier transform of the genotypes. The genotype codes for individual *i*, (*x*_{i1},...,*x _{in}*) are transformed into a sequence of numbers (

$$\left({z}_{ik}\right)=\text{Real}\left(\sum _{j=1}^{m}{x}_{ij}\mathrm{exp}\left(-\frac{2\pi \sqrt{(}-1)(k-1)(j-1)}{n}\right)\right),$$

for *i* = 1,...,*N*. Note that we have indexed the *z*’s from 1 to *n*, rather than 0 to (*n* − 1) as in Wang and Elston [2007]. These new values now form *n* Fourier transform components, *Z _{j}* = (

$${u}_{ft:k}=\sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y})\left({z}_{ik}\right),$$

and *v*_{ft:k} is simply the estimate of the null variance of *u*_{ft:k}, calculated in the same way as before. Wang and Elston [2007] then combine these Fourier score components into a single test statistic by choosing some weighting, *w* of the components, where more weight is given to the lower-frequency components (*i.e.* smaller k) since these are the components with more smoothing. The test suggested by Wang and Elston [2007] is therefore defined by

$${T}_{\mathit{wang}.ft}=\frac{{w}^{T}{U}_{ft}}{\sqrt{{w}^{T}{V}_{ft}w}},$$

where *U _{ft}* = (

We now make a number of comments about the Wang and Elston [2007] approach, and suggest some alternative test statistics.

First, note that the weighting *w _{k}* = (1/(

Following the standard multivariate test, the points above suggest that we could drop the linearly dependent higher-frequency Fourier components and base a test on just the the “m” linearly independent real Fourier components, since this should automatically (approximately) halve the df whilst retaining much of the information. Therefore, we can simply define a multivariate score test based on the linearly independent Fourier components:

$${T}_{\mathit{li}.ft}={U}_{\mathit{li}}^{T}{V}_{\mathit{li}}^{-1}{U}_{\mathit{li}},$$

where *U _{li}* = (

These points also suggest that another possible test statistic might be based upon the full Fourier components, including both real and imaginary parts, since this full transformation is one-to-one and all original information is retained. However such an approach is complicated by the imaginary components and will not be considered further within this paper.

We note also that within the examples of Wang and Elston [2007] it appears that much of the information is found within the score for the first Fourier component which is of the form:

$$\begin{array}{cc}\hfill {u}_{ft:1}=& \sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y}){z}_{i1}\hfill \\ \hfill =& \sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y})\left(\sum _{j=1}^{m}{x}_{ij}\mathrm{exp}\left(-\frac{2\pi \sqrt{(}-1)(j-1)\ast 0}{n}\right)\right)\hfill \\ \hfill =& \sum _{i=1}^{N}({Y}_{i}-\stackrel{\u2012}{Y})\left(\sum _{j=1}^{m}{x}_{ij}\right).\hfill \end{array}$$

This is equivalent to the score for the logistic regression of *Y* upon a single term equal to the sum of genotypes across all *n* loci and has a chi-square distribution on one degree of freedom under the null. We denote the corresponding test by *T*_{1.ft}.

Finally, we point out that if negative correlation exists between pairs of loci, *T*_{1.ft} (and perhaps *T _{wang.ft}*) will perform poorly, as the effects of such loci will cancel. For example, consider two associated loci in perfect, but opposite LD, so that the homozygote coded as 2 at the first locus is always found with the 0 homozygote at the second locus and vice versa; these loci are perfectly negatively correlated and cancel completely. Wang and Elston [2007] recommend recoding the genotypes to avoid this (so that in the above we would swap the 0 and 2 labels for the homozygotes at one of the loci), but it is not always possible to find a recoding removing all negative correlations. Table I illustrate such an example, where the correlation between

Simple example in which swapping allele codings will never produce positive pairwise correlations between all loci

Since the first Fourier component appears to dominate *T _{wang.ft}*, we would expect similar issues arise there also. We will therefore, as suggested in Wang and Elston [2007], also consider the performance of these test statistics in the subset of loci that gives the maximal number of positively correlated loci (

We illustrate the relative performance of the methods described using a simulation study. We take sequence data on 96 individuals from two regions, one of 24 Kb surrounding CTLA4 and one of 48 Kb surrounding IL21R, and use these to define population genotype frequencies for the different regions. We did this by first estimating the population haplotype frequencies, using the EM based *snphap* program [Clayton, 2003], and then assuming Hardy-Weinberg equilibrium (HWE) to generate genotype frequencies. We also consider the region considered by Wang and Elston [2007], CHI3L2. We downloaded estimated haplotype frequencies within the 90 CEU individuals from the hapmap website [The International HapMap Consortium, 2007] and used these to generate population genotype frequencies under HWE. As in Wang and Elston [2007], we selected only those 22 loci with allele frequency above 20%. Details of the three regions are in table II. This table demonstrates that CHI3L2 has high levels of LD and by design also has high allele frequencies and CTLA4 has moderately high levels of LD and relatively high allele frequencies. IL21R, on the other hand, has lower levels of LD and slightly lower allele frequencies. For allele frequencies based upon each region, we also consider the extreme scenario in which there is no LD at all and all loci are independent of one another. This situation is particularly simple to simulate since each genotype is sampled independently based upon the allele frequencies of the observed loci within the region and assuming Hardy-Weinberg equilibrium (HWE).

We then sample *M* = 1000 case-control data sets using these frequencies and a disease model in which for each replicate we sample, uniformly at random, one of the loci within this region to be the causal locus and assume that this acts in a multiplicative way with a relative risk of 1.3. Hence the causal locus varies amongst the *M* data sets, but the relative risk remains fixed. We consider analyses both of each complete data set (including the causal locus) and of reduced data sets based on choosing tag SNPs for each region, using a method based on pairwise LD. The tag SNPs are chosen based upon all loci within the region, including the causal locus, and therefore the causal locus may or may not be included within the set of tagging SNPs. Note that tag SNPs are reselected for each replicate dataset and thus free to vary over replicates.

Since our simulation study needs many replicates we require a quick method of tag SNP selection. For this reason we use a simple cluster analysis to group similar loci and then choose the single locus with highest allele frequency from each group as a tag SNP. Our clustering approach uses Euclidean distance measures and defines distance between clusters according to the average distance between loci. We choose the number of clusters (therefore tag SNPs) to be the smallest number that give a mean *R*^{2}, between all “common” untagged loci and the set of tag SNPs, equal to or less than 0.8. We define “common” to be those loci whose minor allele frequencies are greater than or equal to 0.05. Note that for the IL21R and CHI3L2 regions in table II, the Av. Mean *R*^{2} are below 0.8. This is due to the fact that the average in the table is taken across all loci rather than across just the “common” loci. Similar tag SNP selection methods have been implemented by many others including Byng et al. [2003].

We sample datasets of 2000 cases and 2000 controls and for each dataset we compute the test statistics described above and calculate their appropriate p-values, either based upon asymptotic distributions or by permutation when these are not available. We can then estimate the power of each test, for a given critical level (*α*), to be the proportion of datasets for which we find a p-value more extreme than *α*. We consider *α* levels of 0.01 and 0.001.

Results of the simulation study are given in table III. We see that Goeman’s Bayesian score test *T _{goeman}* and the minp statistic

As may be expected [Chapman et al., 2003], *T _{usual}* has competitive power when using tag SNPs but has lower power when all loci are included in the test statistic. The other test statistics are much less sensitive to the number of loci included and give similar performance on tag SNPs and on the complete data. Fisher’s combined p-value performs very well, particularly when there are many loci in high LD, but has low power when LD is low or in the region. This should be expected, since when there is high LD within a region all/many of the loci are likely to be highly correlated with the causal allele and therefore all/many are likely to have non-null p-values, whilst, when there is low LD, few are likely to be related to the causal locus and therefore only few of the p-values will be non-null: therefore multiplying together p-values as

Wang and Elston [2007] found their test, *T _{wang.ft}*, to outperform several other test statistics, including

We have compared the power of a number of multivariate test statistics in a simulation study based on observed patterns of LD and allele frequencies. Our most striking finding is the poor performance of the test statistic recently proposed by Wang and Elston [2007]. We argue above that this to be expected on theoretical grounds, but these results contradict simulation results presented in Wang and Elston [2007], based on artificial LD patterns and on the CHI3L2 region, in which *T _{wang.ft}* was found to be more powerful than Bonferroni correction of single SNP tests, permutation correction of single SNP test and a logistic regression based likelihood ratio test, and so are worthy of further discussion.

We show above that Wang and Elston [2007] were fortuitous in their choice of causal SNP in CHI3L2, and that in general other tests have greater power than *T _{wang.ft}* on this region. We also believe their results using artificial LD patterns were due to unrealistic features of these simulations. Wang and Elston [2007] consider 4 or 10 markers, with allele frequencies between 0.2 and 0.8 and assume that the causal variant is in the centre of the region studied, with all correlations positive and either equal, given by 0.8

Our results show the importance of comparing procedures over a range of realistic scenarios, ideally based on real data: the relative performance of methods can be strongly dependent on setting. For example, Fisher’s method worked very well for CTLA4 where LD is high, but poorly when there was no LD between loci. However, both Goeman’s Bayesian score test and the minimum p-value test consistently obtain high (often the best) power across all scenarios considered, whilst *T _{wang.ft}* is constantly the worst performer considered and is for example dominated by

We are grateful to our colleagues in the Diabetes and Inflammation Laboratory for sharing the SNP sequence data used within our simulations. The work was funded by the Wellcome Trust (JC’s grant number EPNCBC49) and the Juvenile Diabetes Research Foundation.

- Byng MC, Whittaker JC, Cuthbert AP, Mathew CG, Lewis CM. Snp subset selection for genetic association studies. Ann Hum Genet. 2003;67:543–556. [PubMed]
- Chapman JM, Cooper JD, Todd JA, Clayton DG. Detecting disease associations due to linkage disequilibrium using haplotype tags: A class of tests and the determinants of statistical power. Hum Hered. 2003;56:18–31. [PubMed]
- Clayton D. SNPHAP, a program for estimating frequencies of haplotypes of large numbers of diallelic markers from unphased genotype data from unrelated subjects. 2003. http://www-gene.cimr.cam.ac.uk/clayton/software/
- Fan R, Knapp M. Genome association studies of complex diseases by case-control designs. Am J Hum Genet. 2003;72:850–868. [PubMed]
- Fisher RA. Statistical methods for research workers. 4 edition OLiver and Boyd; London: 1932.
- Goeman JJ, van de Geer SA, van Houwelingen HC. Testing against a high-dimensional alternative. Journal of the Royal Statistical Society Series B. 2005;68:477–493.
- Morris AP, Whittaker JC, D B. Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am J Hum Genet. 2002;70:686–707. [PubMed]
- Schaffer JP. Multiple hypothesis testing. Ann Rev Psych. 1995:46.
- Tachmazidou I, Verzilli CJ, Iorio MD. Genetic association mapping via evolution-based clustering of haplotypes. PLoS Genetics. 2007:3. [PMC free article] [PubMed]
- The International HapMap Consortium 2007. www.hapmap.org.
- The Wellcome Trust Case-Control Consortium Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–78. [PMC free article] [PubMed]
- Wang T, Elston RC. Improved power by use of a weighted score test for linkage disequilibrium mapping. The american journal of human genetics. 2007;80:353–360. [PubMed]
- Xiong M, Zhao J, Boerwinkle E. Generalized
*t*^{2}test for genome association studies. American Journal of Human Genetics. 2002;70:1257–1268. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |