Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.
Published in final edited form as:
PMCID: PMC3197760

Optimal Methods for Meta-analysis of Genome-wide Association Studies


Meta-analysis of genome-wide association studies involves testing single nucleotide polymorphisms (SNPs) using summary statistics that are weighted sums of site-specific score or Wald statistics. This approach avoids having to pool individual-level data. We describe the weights that maximize the power of the summary statistics. For small effect-sizes, any choice of weights yields summary Wald and score statistics with the same power, and the optimal weights are proportional to the square roots of the sites' Fisher information for the SNP's regression coefficient. When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its standard deviation. We give simple approximations to the optimal weights for various phenotypes, and show that weights proportional to the square roots of study sizes are suboptimal for data from case-control studies with varying case-control ratios, for quantitative trait data when the trait variance differs across sites, for count data when the site-specific mean counts differ, and for survival data with different proportions of failing subjects. Simulations suggest that weights that accommodate inter-site variation in imputation error give little power gain compared to those obtained ignoring imputation uncertainties. We note advantages to combining site-specific score statistics, and we show how they can be used to assess effect-size heterogeneity across sites. The utility of the summary score statistic is illustrated by application to a meta-analysis of schizophrenia data in which only site-specific p-values and directions of association are available.

Keywords: combining GWAS, effect-size heterogeneity, meta-analysis, noncentrality parameter, score statistics, Wald statistics


Combining data from multiple genome-wide association studies (GWAS) of a common outcome has emerged as a major tool for identifying susceptibility loci for human disease and other conditions [Scott et al., 2007; Zeggini et al., 2007; Shi et al., 2009]. The goals are to discover new variants missed by the individual studies, to identify variants for outcomes not considered when the data were collected (e.g. cancer survival), and, for associated variants, to assess effect-size and its possible variation across sites. A challenge to achieving these goals is the need to assess and accommodate inter-site differences in study characteristics, such as demographic attributes of the populations studied, phenotypic aspects such as disease severity and extent of censoring of survival data, the SNPs included in the genotyping platforms used, and the choice and coding of covariates in need of adjustment.

The pooled analysis of individual-level data on phenotypes, genotypes and covariates has several advantages. Quality control can be implemented uniformly for all sites, and SNP genotypes can be imputed using data from all sites. Covariates common to multiple studies can be coded uniformly and their regression coefficients can be estimated with all available data. Optimal likelihood-based methods can be used to test whether SNP regression coefficients are nonzero and whether they vary across study sites. Offsetting these advantages is the labor involved in assembling the raw data from each site and coding common covariates, and privacy issues which could limit the sharing of genotypes.

An alternative to pooled analysis is the meta-analysis of site-specific test statistics, each having approximately a standard normal distribution under the null hypothesis of no association. Here each site types or imputes genotypes for a common set of SNPs, and then calculates a covariate-adjusted statistic for each SNP. The site-specific statistics are either Wald statistics (ratios of regression coefficient estimates to their estimated standard deviations (SDs), or score statistics (ratios of efficient scores to estimates of their null SDs) [Soranzo et al., 2009; Tanaka et al., 2009]. The coordinating center then computes a weighted sum of these statistics as a summary test statistic for the SNP [Cantor et al., 2010]. For reviews of issues in GWAS meta-analysis, see [Ioannidis 2007; Ioannidis et al., 2007; de Bakker et al., 2008; Guan and Stephen, 2008; Zeggini and Ioannidis, 2009; Cantor et al., 2010].

When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its estimated standard deviation [Cohran 1954]. However there are practical advantages to combining score statistics rather than Wald statistics for GWAS meta-analyses. First, the score statistics do not require iterative parameter estimates for each SNP to be tested. Estimates for the site-specific covariate effects need be calculated just once under the global null hypothesis of no effect for any SNP; thus score statistics provide fast assessment of the statistical significance of large numbers of SNPs. Second, a SNP score statistic can be computed even when the site has provided only the SNP's p-value and direction of association. Third, the summary score statistic may be easier to interpret when the sites differ with respect to the covariates included in their regression models or with respect to their phenotype definition, measurement or coding. For example, a GWAS of nicotine addiction among current and former smokers might include sites whose outcome is a count of smoking cessation attempts, and other sites that simply classified subjects according to presence or absence of a successful cessation attempt. For a meta-analysis such as this, a combination of site-specific score statistics may be easier to interpret than a combination of site-specific Wald statistics, since the SNP regression coefficients in the latter are not comparable. Finally, site-specific score statistics have straightforward extensions to accommodate uncertainty due to imputation of unobserved genotypes, and we shall show that they can be used to assess heterogeneity of effect-size across sites.

Here we focus on optimal weights for combining site-specific statistics to identify new variants and, for associated variants, to assess possible effect-size variation across sites. Asymptotically optimal weights for combining Wald statistics have been known for more than half a century [Cochran 1954; DerSimonian and Laird, 1986]. Surprisingly however, despite their practical advantages, optimal weights for combining score statistics are not well known. In fact, some investigators suggest or use weights proportional to the square roots of the study-specific sample sizes [Soranzo et al., 2009; Willer et al., 2010; Hu et al., 2011]. We show that this strategy can be suboptimal when combining data from sites whose phenotypes have different null distributions. Instead, for the small effect-sizes expected of GWAS meta-analysis, the optimal weights for combining score statistics are essentially the same as those used to combine Wald statistics. We provide explicit forms for the optimal weights for a general class of phenotypes that includes binary outcomes, quantitative traits, counts of events and censored survival outcomes.

In the following Methods section we begin by reviewing the score and Wald statistics for testing the null hypothesis of no SNP-phenotype association at a single site, with application to case-control studies, quantitative traits, count data and censored survival data. We also show how the site-specific score statistics can be extended to handle imputation uncertainties, as noted by Marchini et al., [2007]. Then, for each of these phenotypes, we describe the optimal weights for combining site-specific score or Wald statistics. We also extend the weights to handle SNP effect-sizes that vary across sites, and we show how the site-specific score statistics can be used to assess inter-site heterogeneity. The Methods section is followed by an evaluation of power for various summary score and Wald statistics, based on simulated case-control and censored survival data. An application to schizophrenia data shows that for binary outcomes, the score statistics can be used to assess inter-site effect-size heterogeneity even when only site-specific SNP p-values and direction of association are available. The final section concludes with a brief discussion.


Score and Wald Statistics for One Site

For simplicity we model the effects of each SNP as additive on an appropriate scale (the so-called trend or gene dosage model), though other genotype models can be used. Let (y, g, x) = {(yj, xj, gj), j = 1, …, n} represent the data for the n study subjects. For subject j, yj is the phenotype, xj is a column vector of covariates whose first component may be one to accommodate an intercept, and gj = (gj1, gj2, …) is a vector of minor allele counts for each of the SNPs under study. We focus on testing association for just one of these SNPs, and omit its subscript. We assume that the loglikelihood [ell](y[mid ]x, g; θ) for the phenotypes y conditional on covariates x and genotypes g depends on a parameter θ = (α, β). Here α is a s-dimensional vector of parameters corresponding to an intercept and/or any covariates used by the site, and β specifies the trend relating phenotype to SNP minor allele count.

We wish to test the null hypothesis H0: θ = θ0 = (α, 0). To do so, we introduce the Fisher information for θ, given by

equation M1

Here iαα is the s × s matrix whose entries are the negative expectations of the second derivatives of the loglikelihood function for θ with respect to α, and the other submatrices are defined analogously. Also let equation M2 denote the (s + 1)-dimensional score vector whose entries are the derivatives of the loglikelihood function with respect to θ = (α, β). The covariate-adjusted efficient score for testing β = 0 is Uβ = Uβ ([alpha](0), 0), where [alpha](0) satisfies Uα ([alpha]k (0), 0) = 0. Under H0 and mild regularity conditions [Cox and Hinkley, 1979], Uβ has an asymptotic Gaussian distribution with mean zero and variance

equation M3

evaluated at ([alpha](0), 0).

The score and Wald statistics are

equation M4

where îβ is a consistent estimate of iβ evaluated at β = 0, [beta] is the maximum likelihood estimate (MLE) of β, and equation M5 evaluated at [beta]. The two statistics are locally asymptotically equivalent, i.e., they are asymptotically equivalent when β = 0 and are approximately so when β is small in absolute value [Cox and Hinkley, 1979]. Their noncentrality parameters (NCPs) are equation M6, with iβ evaluated at θ0 = (α, 0) for Zscore and at θ = (α, β) for ZWald. In the following we shall approximate iβ for the Wald statistic by evaluating it at its null value, and refer to a common NCP ξ for both statistics.

We now specialize the score and Wald statistics to specific phenotypes of relevance to GWAS. We first consider phenotypes whose relation to genotypes and covariates can be described by generalized linear models (GLMs) [McCullagh and Nelder, 1989]. Special cases of particular relevance are the binary phenotypes of case-control studies, quantitative traits such as height or weight, and counts of events such as attempts to quit smoking, and censored survival data.

Example: generalized linear models

GLMs assume that the mean but not the dispersion parameter of the phenotype distribution depends on genotypes and covariates, so that their effects can be modeled as a function of a linear predictor ηj = αxj + βgj. The first component of the covariate vector xj may equal one, in order to accommodate an intercept. The loglikelihood is

equation M7

where a (·), b (·) and c (·) are known functions, and ϕ is the dispersion parameter of the model. The score for β corresponding to (4) is

equation M8

where [mu]j is the fitted mean of y under H0, and [var phiv with circumflex] is a consistent estimate of ϕ. The null Fisher information (1), with θ = θ0, and ηj = αxj is

equation M9

In the absence of covariates xj = 1, ηj = α1 where α1 is the intercept, and (6) reduces to

equation M10

where p = 1 − q is the SNP minor allele frequency (MAF). Using (7) in (2), we find that the null Fisher information for β is

equation M11

For binary phenotypes yj is coded as one if subject j has the trait and zero otherwise, and the logistic model corresponds to ϕ = a (ϕ) = c (y; ϕ) = 1, and b (ηj) = ln [1 + eηj]. In a case-control study with n1 cases and n0 = nn1 controls, b″ (α1) can be estimated consistently by n0n1/n2.

Substituting these expressions in (8) gives

equation M12

For quantitative traits whose distribution is (possibly after transformation) Gaussian with mean μj = ηj and variance ϕ, we have a (ϕ) = ϕ, equation M13 and equation M14. Thus b″ (α1) = 1. Substitution of these values for a (ϕ) and b″ (α1) into (8) gives

equation M15

where [phi with hat] is a consistent estimate of the phenotype variance.

For Poisson count data, a (ϕ) = 1, b (ηj) = eηj and c (y; ϕ) = 0. Thus b″ (α1) = eα1 = μ, where μ is the null mean of the counts. Substituting these values into (8) gives

equation M16

Example: censored survival data

Here the data are equation M17 where for subject j, tj is survival time and εj is a failure indicator. We assume independent times to failure and censoring, each with hazard that may vary across studies. We also assume that the failure distribution follows a proportional hazards model of the form h (t; xj, gj) = h* (t) eαxj+βgj, where h* (t) is the baseline hazard, and that the m failures occur at distinct times. Inferences are based on the log of a partial likelihood function for β, given by equation M18, where m is the number of failures, gi1 and xi1 are, respectively, the genotype and covariates of the ith failure, and Ri is the set of subjects at risk just before this failure, i = 1, …, m. The corresponding score for β is

equation M19

In the absence of covariates, the null Fisher information for β, as given by (2) and (1), is

equation M20

Accommodating imputed genotypes

Meta-analysis of site-specific statistics requires local imputation at the study sites. This involves calculating posterior probabilities Pjg for each subject's unobserved SNP minor allele count g, with Pj0 + Pj1 + Pj2 = 1. Here Pjg is the conditional probability that subject j has allele count g, given the observed genotype data Obs, and the haplotypes in a reference panel such as HapMap2 [International HapMap Consortium et al., 2007]. Louis [1982] provided a likelihood-based extension to the score statistic to accommodate this imputation. Inferences are based on the probability of phenotypes conditional on observed genotype data Obs, and covariates x:

equation M21

Here g denotes a vector of genotypes for the nk subjects, and the posterior probabilities Pr (g[mid ]Obs) = Pr(g) / Σg[set membership]G Pr (g′), where Pr (g) is based on the Hardy-Weinberg MAF, and the sum is taken over genotypes in the set G consistent with the observed data Obs and the haplotypes in the reference panel. The posterior probabilities can be computed using available imputation software [Li and Abecasis 2006; Scheet and Stephens, 2006; Browning BL and Browning SR, 2009; Howie et al., 2009]. As shown by Louis [1982], the score for β is

equation M22

where equation M23 is the complete data score and the expectation is taken with respect to the posterior distribution Pr (g[mid ]Obs) for the SNP of interest. The null variance of Uβ is given by (2) and (1), but with the Fisher information for the observed data reduced by a penalty due to the uncertainty in genotypes:

equation M24

where Jc is the negative of the second derivative matrix for the complete data. The score statistic is Zscore = Uββ;where îβ is obtained from (2) and (16).

The score Uβ of (5) and (12) for GLMs and survival data depends linearly on the genotypes gj. Thus in the presence of unobserved genotypes, the posterior expectation (15) can be obtained simply by replacing the unobserved allele counts with their posterior means, as noted by Schaid et al., [2002] and Marchini et al., [2007]. In addition, the entries of the information matrix (16) are quadratic forms in the genotypes. Thus their posterior expectations can be computed as functions of the first and second moments of the posterior genotype distributions. In practice however, imputation uncertainties are generally ignored for both score and Wald statistics; instead, subjects' unknown minor allele counts are replaced by their posterior means Pj1 + 2Pj2.

Meta-Analysis of Data from K Sites

Suppose now that each of K sites has computed a score or Wald statistic using a regression model that may include an intercept and/or covariates. We wish to combine these site-specific statistics into a single summary statistic to assess the overall evidence for or against association between a SNP and a given phenotype. We assume that subjects participate in only one study (see [Lin and Sullivan, 2009] for methods applicable to studies with overlapping study subjects). We first describe the weights optimal under the assumption that the SNP effect-size β is common across sites (homogeneous effect-size), and then consider the weights optimal when the effect-sizes β1, …, βK vary across sites (heterogeneous effect-size).

Homogeneous effect-size

Here the null hypothesis to be tested is H0: θ = θ0 = (α1, …, αK, 0), where αk is the sk-dimensional nuisance parameter for study k, k = 1, …, K. The summary statistics used for this purpose in GWAS meta-analysis are weighted sums equation M25, where Zk is either a site-specific score statistic or a site-specific Wald statistic, and where the weights wk are nonnegative constants with equation M26. Under H0, both summary statistics, Zscore and ZWald, have standard Gaussian distributions asymptotically [Cox and Hinkley, 1979]. The optimal weights wk are defined as those that maximize the summary statistic's NCP, given by ξ = Σkwkξk, where ξk is the NCP for the score and Wald statistics at site k, k = 1, …, K. We have seen that the NCP for both site-specific score and site-specific Wald statistics are equation M27; thus equation M28. A straightforward application of the Cauchy-Schwartz inequality shows that the weights wk that maximize ξ under local alternatives are proportional to equation M29. Thus the optimal summary score and Wald statistics are given by

equation M30

They are asymptotically equivalent for testing θ = θ0 under the null hypothesis and under local alternatives. Replacing the the terms Zk:score with the score statistics of (3) in the first equality of (17) shows that the summary Zscore is just the well-known stratified score statistic, i.e., the one obtained from the sum of the K loglikelihood functions:

equation M31

For case-control data, Zscore is the Mantel-Haenzel test for trend in a 2 × K table [Breslow and Day, 1980]. Moreover, replacing the terms Zk:Wald with the Wald statistics of (3) in the second equality of (17) shows that the summary ZWald is the commonly deemed inverse-variance-weighted summary statistic proposed by Cochran [1954] and given by

equation M32

The term “inverse-variance-weighted” refers to the fact [beta] is a weighted sum of the site-specific regression coefficients [beta]k, weighted in inverse proportion to their estimated variances:

Table 1 gives the optimal weights for case-control data, quantitative trait data, Poisson count data and censored survival data, when the participating sites do not include covariates in the model, and do not adjust for imputation. The weight wk for a given site and phenotype is proportional to the square root of the corresponding estimated null information î for β, as given by the appropriate choice of expressions (9-13). In contrast, the weights based on study size are proportional to equation M33. Comparison of these size-based weights to those in Table 1 shows that, even though SNP allele frequencies may be similar across sites, the size-based weights are suboptimal for case-control data from studies with varying case-control ratios, for quantitative trait data with varying trait variances, for count data with varying null expected counts, and for censored survival data with varying numbers of failures. Since the variance of a quantitative trait depends on the covariates included in the regression model which may vary across sites, using a common trait variance may in some circumstances be substantially suboptimal.

Table 1
Optimal Weights in GWAS Meta-analysis without Covariates

In the presence of site-specific adjustment for covariates and/or imputation uncertainty, the optimal weights are more complex than those shown in Table 1. According to (17), they are proportional to the square roots of the null SNP information î, which depend on the covariate distributions, covariate effects, and imputation uncertainty specific to each site. In principal, the values î could be provided by the sites. Instead, Zaitlen and Eskin [2010] have suggested accommodating imputation uncertainty by incorporating in each site's weight an estimate of the correlation coefficient rk between true and imputed SNP allele counts. In the following section we use simulations to evaluate the power of this strategy.

Heterogeneous effect-size

Site-specific effect-size variation is plausible for several reasons. First, the studies may have used different genotyping platforms with different numbers and locations of SNPs. This results in differences across sites in levels of genotype measurement error, creating different levels of regression coefficient attenuation across sites. Second, when covariates or their coding differ across sites, different levels of control for confounding can affect SNP effect-sizes. For example, genetic ancestry may be associated with both phenotype and genotypes, and variable control for ancestry may induce differential levels of residual confounding. Third, for survival data, site-specific variation may occur if effect-size changes over time and the studies differ in level and type of censoring. Finally, the SNP of interest may interact with other risk factors whose prevalences vary across sites, which would cause variation in effect-size.

The site-specific Wald statistics have the form Zk:Wald = [beta]k/SD ([beta]k), where [beta]k is the MLE for β and equation M34. These quantities lend themselves naturally to evaluating SNP effect-size heterogeneity across sites, using the methods of Cochran [1954] and DerSimonian and Laird [1986]. Here we show that the components of the site-specific score statistics equation M35 also can be used for this purpose. Specifically, when βk is small in absolute value, the asymptotic mean [var phi]k of U can be approximated as [var phi]k~ βki (see Appendix). This approximation suggests estimating βk by equation M36, with SD ([beta]k) estimated as equation M37, k = 1, …, K. Then Cochran's homogeneity test can be applied with [beta]k equal to either the MLE or the quantity equation M38.

For either the score-based or Wald-based estimates [beta]1, …, [beta]K, Cochran's test is based on the statistic Q = Σkîk ([beta]k[beta])2, where [beta] is given by (18). Under the homogeneity hypothesis β1 = … βK = β, Q has a chi-squared distribution on K − 1 degrees of freedom (DF). When homogeneity is rejected, DerSimonian and Laird [1986] propose estimating the inter-site variance τ2 of the βk as

equation M39

where equation M40, r = 1, 2. The overall effect-size is estimated as

equation M41

Corresponding summary statistics for testing β1 = … = βK = 0 are

equation M42

and Zk denotes either Zk:score or Zk:Wald. When Cochran's test does not reject homogeneity and [tau]2 is set to zero, the two summary statistics (19) reduce to their usual forms (17) optimal for homogeneous effects.

In conclusion, one can choose between four types of summary statistics, each formed as a weighted sum of site-specific statistics. The four summary statistics differ according to the site-specific statistics (score vs Wald) and whether the weights are chosen to be optimal under effect-size homogeneity (ZHM) or heterogeneity (ZHT).


We simulated case-control and survival data to evaluate the performances of the optimal summary score and Wald statistics, and to compare them to that of the size-weighted summary ZSZ, obtained using weights proportional to the square roots of study sizes. For each replication, we generated data assuming both homogeneous and heterogeneous effect-sizes, with and without imputation error, and then computed the various summary statistics. We flagged those exceeding certain thresholds, and then computed for each statistic the proportion of all replications in which it was flagged.

Type-1 Error

To evaluate type 1 error, we compared nominal to empirical type-1 error rates, defined as the proportion of replications in which the summary test statistic exceeded the standard Gaussian threshold for the nominal two-sided p-value. We considered two null hypotheses. The first homogeneity null hypothesis specifies that the true values of the βk satisfy β1 = … = βK = 0 The second heterogeneity null hypothesis specifies that the βk are a random sample from a Gaussian distribution with mean zero and variance τ2 > 0. Under the homogeneity null hypothesis, we found good agreement between nominal and empirical p-values for all summary test statistics considered. In contrast, under the heterogeneity null hypothesis all summary statistics showed inflated empirical error rates relative to their nominal values, with the extent of inflation somewhat attenuated when the summary statistics were heterogeneity-weighted compared to homogeneity-weighted (data not shown). We shall return to this finding after a comparison of test power.


Case-control data

We simulated the meta-analysis of K = 3 case-control studies having the sample sizes shown in Table 2. We generated genotypes for a single SNP assuming Hardy-Weinberg allele frequencies, and then used a logistic regression model with study-specific intercepts (α1, α2, α3) = (−3.0, −2.5, −2.5) and one of three effect-size triples (β1, β2, β3) to generate case and control phenotypes until we had achieved the targeted numbers of cases and controls. Two triples correspond to effect-size homogeneity, as given by β1 = β2 = β3 = ln (1.3) = 0.26, and β1 = β2 = β3 = ln (1.4) = 0.34. The third triple (β1, β2, β3) = (0, 0.34, 0.26) corresponds to effect-size heterogeneity across the three sites. Power was taken as the proportion of the 1000 replications in which the test statistic exceeded the threshold for a p-value of 10−8.

Table 2
Test Powera in Meta-analysis of Three Case-control Studies (α = 10-8)

Table 2 shows power for summary score and Wald statistics with weights determined by assumptions of homogeneous (ZHM) and heterogeneous (ZHT) effect-sizes, and with size-based weights (ZSZ). As expected given their asymptotic equivalence, the score-based and Wald-based summary statistics performed similarly for any given choice of weights. For all types of data, the homogeneity-weighted summary statistics were most powerful. When the data were generated under the homogeneous effect-size assumption, both homogeneity- and heterogeneity-weighted statistics ZHM and ZHT consistently outperformed the size-weighted statistic ZSZ. However when the data were generated according to heterogeneous effect-sizes, the heterogeneity-weighted statistic ZHT performed poorly compared to the other two. This finding, which is consistent with that reported in other applications [Clayton and Hills, 1993], reflects power loss due to sampling error in estimating the inter-site effect-size variance τ2.

Survival data

We also generated censored survival data for a total of 2,483 subjects of European ancestry from Heidelberg Germany, Ontario Canada, and Seattle US who have been genotyped for 600,000 SNPs using the Illumina 610Quad chip as part of an ongoing GWAS of early-onset breast cancer. To mimic a meta-analysis based on K = 6 study sites, we split the subjects from each of the three regions into two subsets, which yielded six sites with the sample sizes shown in Table 3. We focused on 1000 SNPs in a 6-MB region of chromosome 2 (positions 1,000,713 to 6,488,209). To generate subjects' survival times, we assumed independent exponentially distributed times to failure and censoring with site-specific baseline and censoring hazards λk, μk, k = 1, …, 6, independent of nongenetic covariates. These values are shown in Table 3. Subjects who had not failed by time t = 1 were censored at that time. We assumed that the site-specific failure hazards also depend, via a proportional hazards model, on the minor allele counts of SNP rs17360123 at position 5,531,665 (MAF = 0.2) and SNP rs4072375 at position 3,565,156 (MAF = 0.4). Specifically, we assumed that the failure hazard at site k was λk exp(βkgkj), where gkj is the SNP minor allele count for the jth subject at site k, k = 1, …, 6. For the homogeneity model we took the common β to be 0.25, 0.30 and 0.35, and for the heterogeneity model we took β = 0 for sites 1 and 2, and β = 0.35 for the remaining sites.

Table 3
Sample Sizes and Hazard Rates for Data from Six Survival Studies

Table 4 shows the power of homogeneity-weighted, heterogeneity-weighted and size-weighted summary score and Wald statistics under these generating models. The same patterns found for the case-control data in Table 2 are evident here: the homogeneity weighted summary statistics are consistently most powerful, and when the data were generated with homogeneous SNP effects, the size-weighted statistic is least powerful. In the presence of heterogeneous SNP effects, however, the heterogeneity-weighted test statistic is least powerful, most likely because of power loss associated with error in estimating the variance parameter τ2.

Table 4
Test Powera in Meta-analysis of Six Survival Studies

The poor power of the heterogeneity-weighted test statistics mitigates against their use in GWAS meta-analysis. In general, these statistics are used to guard against false positives when the null hypothesis specifies varying site-specific effect-sizes whose overall mean is zero. However the primary null hypothesis in the GWAS setting specifies zero SNP effect-size at all K sites. This suggests the strategy of testing all SNPs using a homogeneity-weighted summary statistic, and investigating heterogeneity for those SNPs whose p-values suggest departures from this primary null hypothesis.

Finally, we evaluated how imputed genotypes affect the power of the homogeneity-weighted score-based test by masking and imputing some sites' genotypes in regions containing SNPs rs17360123 and rs4072375. These two SNPs were assumed associated with survival with a common effect-size β = 0.35 across the six sites. We masked and imputed either: i) the associated SNPs and their nearest flanking markers (a total of three SNPs imputed); or ii) the associated SNPs and their nearest two flanking markers (a total of five SNPs imputed). SNPs were masked and imputed for: a) one study (study 1); b) three studies (studies 1,3,5); and c) five studies (studies 1-5). We conducted the imputations using the HapMap2 reference panel in an eight megabase region containing 9262 SNPs. The five studies with masked SNPs had estimated correlation coefficients between imputed and actual rs17360123 minor allele counts of ([r with circumflex]1, …, [r with circumflex]5) = (0.62, 0.66, 0.57, 0.56, 0.60) when three SNPs were imputed, and (0.59, 0.58, 0.51, 0.52, 0.49) when five SNPs were imputed. The corresponding values for rs4072375 were (0.82, 0.77, 0.57, 0.62, 0.68) and (0.54, 0, 58, 0, 49, 0.46, 0.45). These estimates were obtained as described by Li and Abecasis [2006]. In total, we considered the 12 data sets described in Table 5. We imputed all masked SNPs using the Caucasian data from the HapMap2 reference panel and the imputation software BEAGLE [Browning BL and Browning SR, 2009] ( We evaluated the power of three statistics adapted from the homogeneity-weighted summary score statistic ZHM. The first is the likelihood-based score statistic given by equations (15, 16) that accounts for imputation uncertainty. The second statistic replaces subjects' unobserved SNP minor allele counts by their posterior means, ignoring imputation uncertainties. The third statistic also is based on subjects posterior mean minor allele counts, but with weights proportional to equation M43, a strategy similar to that proposed by Zaitlen and Eskin [2010].

Table 5
Test Powera in Score-based Meta-analysis of Six Survival Studies with SNP Imputation

Table 5 shows power for each of these three summary statistics as applied to each of the 12 data sets. For comparison, the power for the optimal tests with completely observed data was 0.827 for SNP rs17360123 and 0.954 for SNP rs4012375 (Table 4). Several results are noteworthy. First, as expected, the power of all tests dropped increasingly with the masking of increasingly many SNPs at increasingly many sites. Moreover the rs4012375 SNP suffers more power loss from masking five rather than three SNPs than does rs17360123. This difference mirrors the correspondingly larger loss of imputation accuracy for rs4012375 than rs17360123, as measured by their estimated correlation coefficients, described in the preceding paragraph. Second, substituting posterior means performed as well as the likelihood-based score statistics. In fact, substituting the posterior means for the unobserved genotypes gives exactly the observed data score equation M44 (15), since the complete data score is a linear function of the unobserved genotypes. Moreover the Fisher information computed by substituting posterior means for unobserved genotypes is an approximation to the Fisher information equation M45 for the observed data, given by (16), and it accounts for genotype uncertainty by its reduced inter-subject variation for poorly imputed SNPs. So it is not surprising that these two statistics show similar power. Third, the statistic formed by weighting by equation M46 did not perform as well, most likely due to inaccuracies in the estimates [r with circumflex]k and sensitivity of the weights to such error. These results contrast with those of Zaitlen and Eskin [2010] who found slightly increased power using this strategy; further investigation of this matter would be useful.

Application to GWAS Meta-Analysis

We illustrate the summary score-statistics by application to the meta-analysis of three case-control GWAS studies of schizophrenia in European-Americans described by Shi et al., [2009]. The three studies are the SGENE Consortium [Stefansson et al., 2009], The International Schizophrenia Consortium [ISC; International Schizophrenia Consortium, 2009] and The Molecular Genetics of Schizophrenia study, [MGS: Shi et al., 2009]. These studies enlisted a total of 8,008 cases and 19,077 controls, as shown in Table 6. For the meta-analysis, only site-specific p-values and the directions of association for the logistic trend test were available for each of a common set of SNPs in six regions containing SNPs satisfying a predetermined threshold for statistical significance in at least one of the three studies. For the meta-analysis the authors converted each p-value to a score statistic determined as the quantile corresponding to the appropriate tail of the standard Gaussian distribution. They then evaluated SNPs using the summary score statistic of (17), with weights proportional to the square roots of the estimates î of (9). GWAS tag SNPs are selected to have common frequencies in different Caucasian populations; thus the authors assumed that the SNP MAFs are approximately the same across the three sites, which gives weights wk proportional to [(nk0nk1) / (nk0 + nk1)]1/2.

Table 6
Sample Sizes in Meta-Analysis of K=3 Schizophrenia Case-control GWASa

We computed three test statistics for each of the 28 SNPs in the major histocompatibility complex class I and II identified by the authors as significantly associated with schizophrenia risk. These test statistics are: 1) the homogeneity-weighted statistic (17) used by Shi et al., [2009]; 2) the heterogeneity-weighted statistic (19); and 3) the size-weighted statistic with weights proportional to (nk0 + nk1)1/2. For the heterogeneity-weighted statistic, we found that the estimated inter-study variances [tau]2 were zero for all SNPs, and Cochran's Q test did not reject homogeneity at a significance level of 0.1. (An alternate measure of heterogeneity, defined as I2 = 1 − [(K − 1) /Q], estimates the percentage of total inter-site variation beyond chance, with heterogeneity suggested by values I2 > 0.5 [Ioannidis et al., 2007]. For these data, I2 was negative for all 28 SNPs, showing no evidence of heterogeneity.) Thus the heterogeneity- and homogeneity-weighted score statistics were equal for all 28 SNPs. Figure 1 shows that this summary score statistic yielded smaller p-values than the size-weighted statistic.

Fig. 1
p-values of size-weighted statistics (y-axis) versus summary score statistics with homogeneity weights(x-axis) for 28 top schizophrenia SNPs

This example illustrates three points. First, site-specific data can be combined optimally simply by sharing p-values, the directions of association, and the null information estimates (as obtained from case and control sample sizes). Second, heterogeneity across studies can be evaluated even with such minimal information from the sites. Finally, as expected from the theory and simulations, the optimally-weighted summary score-statistic is more powerful than the size-weighted score-statistic. For case-control data such as these, optimal weighting achieves the greatest power gain relative to size-based weighting when case and control numbers are roughly the same for some studies, but substantially different for others (as in the present example). Power gains also can be substantial for meta-analysis of censored survival data when censoring levels differ among the studies.


We have reviewed SNP association tests based on weighted sums of site-specific score and Wald statistics. We have shown that when SNP effect-size β is constant across sites, the optimal weights for both statistics are approximately proportional to the square roots of the sites' null Fisher information for the SNP of interest, and we have described these weights for the phenotypes commonly studied with GWAS. These optimal weights have simple forms in the absence of site-specific differences in covariates and their effects, and in the absence of accommodation for site-specific differences in imputation uncertainty. Their optimality in these circumstances is supported by simulation results. In the presence of covariate adjustment, the simple weights described here should be nearly optimal unless the sites differ strongly in choice of covariates or their effect-sizes. The simulations also suggest little power loss in using weights that ignore imputation uncertainties. This finding is consistent with the similarities in the two statistics noted in the Methods section.

We also have shown how to use the site-specific score statistics to obtain SNP effect-size estimates for assessing inter-site effect-size heterogeneity. An application to schizophrenia data shows that for case-control GWAS meta-analysis, such heterogeneity can be assessed even when only site-specific SNP p-values and directions of association are available. Both simulations and the data application support the conclusion of others that Cochran's test has poor power to detect effect-size heterogeneity, particularly when the number of sites is small [Clayton and Hills, 1993]. Thus we suggest performing GWAS meta-analysis using the homogeneity-weighted summary statistics and examining heterogeneity for the hit SNPs, with the caveat that lack of evidence for heterogeneity can be interpreted only as ruling out very large differences across sites.

A limitation of all meta-analysis is its need for site-specific estimates of regression coefficients for covariates that are common to all or several sites (e.g. age, sex, ancestry). The simulations of Lin and Zeng [2010a; 2010b] suggest that site-specific regression coefficient estimates for such covariates do not produce substantial bias or power loss. Although this conclusion seems plausible unless the covariates are strongly associated with both genotype and phenotype, more evidence would be useful.


This research was supported by NIH grants R01 CA094069 and U01 CA122171. The authors are grateful to Habibul Ahsan, Irene Andrulis, Jenny Claude-Chang, Mohammed Kibriya, and Kathi Malone for the use of these genotype data, and to the reviewers of an earlier version of the manuscript for their helpful suggestions.

Appendix: Approximating the Asymptotic Mean of the Score uβ

Let U (θ0) = (Uα (θ0), Uβ (θ0))T denote the vector of efficient scores for θ, evaluated at θ0 = (α, 0). Thus E [U (θ0); θ0] = 0. To approximate this expectation over local alternatives for θ, we evaluate it over a distribution governed by a value near to but not equal to θ0. Thus we write θ = θ0 + δ, where δ = (0, β)T, and β is small in absolute value. Using a multivariate version of the argument given by Cox and Hinkley [1979, p 109]; we can write

equation M47

where i (θ0) is the Fisher information for θ evaluated at θ0. Similarly, the covariance matrix of U (θ0) is

equation M48

These results, when used with standard normal theory, show that, conditional on θ0 = ([alpha] (0), 0), the asymptotic distribution of Uβ ([alpha] (0), 0) = Uβ is normal with mean βiβ + o (|β|) and variance iβ+ O (|β|). Here iβ is given by (2) with i(θ) = i (θ0).


  • Breslow NE, Day NE. Statistical methods in cancer research, vol 1, the Analysis of Case-control Studies. Lyon: International Agency for Research on Cancer; 1980.
  • Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84:210–23. [PubMed]
  • Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS results: A review of statistical methods and recommendations for their application. Am J Hum Genet. 2010;86:6–22. [PubMed]
  • Clayton D, Hills M. Statistical models in epidemiology. Oxford: Oxford University Press; 1993.
  • Cochran BG. The combination of estimates from different experiments. Biometrics. 1954;10:101–29.
  • Cox DR, Hinkley CV. Theoretical statistics. London: Chapman and Hall; 1979.
  • de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17:R122–8. [PMC free article] [PubMed]
  • DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–88. [PubMed]
  • Guan Y, Stephens M. Practical issues in imputation-based association mapping. PLoS Genet. 2008;4:e1000279. [PMC free article] [PubMed]
  • Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. [PMC free article] [PubMed]
  • Hu X, Pickering E, Liu YC, Hall S, Fournier H, Katz E, Dechairo B, John S, Van Eerdewegh P, Soares H. Alzheimer's Disease Neuroimaging Initiative. Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS ONE. 2011;6:e16616. [PMC free article] [PubMed]
  • International HapMap Consortium. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61. [PMC free article] [PubMed]
  • International Schizophrenia Consortium. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52. [PubMed]
  • Ioannidis JP. Non-replication and inconsistency in the genome-wide association setting. Hum Hered. 2007;64:203–13. [PubMed]
  • Ioannidis JP, Patsopoulos NA, Evangelou E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One. 2007;2:e841. [PMC free article] [PubMed]
  • Li Y, Abecasis GR. Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006;S79:2290.
  • Lin DY, Sullivan PF. Meta-analysis of genome-wide association studies with overlapping subjects. Am J Hum Genet. 2009;85:862–72. [PubMed]
  • Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol. 2010a;34:60–6. [PubMed]
  • Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika. 2010b;97:321–32. [PMC free article] [PubMed]
  • Louis TA. Finding the observed information matrix when using the EM algorithm. Jr Stat Soc. 1982;B44:226–33.
  • Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13. [PubMed]
  • McCullagh PJ, Nelder JA. Generalized linear models. 2nd. London: Chapman and Hall; 1989.
  • Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002;70:425–34. [PubMed]
  • Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–44. [PubMed]
  • Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–5. [PMC free article] [PubMed]
  • Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R, Gejman PV. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009;460:753–7. [PMC free article] [PubMed]
  • Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJ, Ravindrarajah R, Ermakov S, Estrada K, Pols HA, Williams FM, McArdle WL, van Meurs JB, Loos RJ, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG, Deloukas P. Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size. PLoS Genet. 2009;5:e1000445. [PMC free article] [PubMed]
  • Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietiläinen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Børglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Böttcher Y, Olesen J, Breuer R, Möller HJ, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Réthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Genetic Risk and Outcome in Psychosis (GROUP) Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jönsson EG, Terenius L, Agartz I, Petursson H, Nöthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K, Collier DA. Common variants conferring risk of schizophrenia. Nature. 2009;460:744–7. [PMC free article] [PubMed]
  • Tanaka T, Scheet P, Giusti B, Bandinelli S, Piras MG, Usala G, Lai S, Mulas A, Corsi AM, Vestrini A, Sofi F, Gori AM, Abbate R, Guralnik J, Singleton A, Abecasis GR, Schlessinger D, Uda M, Ferrucci L. Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am J Hum Genet. 2009;84:477–82. [PubMed]
  • Willer C, Li Y, Abecasis G. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–1. [PMC free article] [PubMed]
  • Zaitlen N, Eskin E. Imputation aware meta-analysis of genome-wide association studies. Genet Epid. 2010;34:537–42. [PMC free article] [PubMed]
  • Zeggini E, Ioannidis JP. Meta-analysis in genome-wide association studies. Pharmacogenomics. 2009;10:191–201. [PMC free article] [PubMed]
  • Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, Wellcome Trust Case Control Consortium (WTCCC) McCarthy MI, Hattersley AT. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–41. [PubMed]