Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3197760

Formats

Article sections

- Abstract
- Introduction
- Methods
- Meta-Analysis of Data from K Sites
- Simulations
- Application to GWAS Meta-Analysis
- Discussion
- References

Authors

Related links

Genet Epidemiol. Author manuscript; available in PMC 2012 November 1.

Published in final edited form as:

Published online 2011 September 15. doi: 10.1002/gepi.20603

PMCID: PMC3197760

NIHMSID: NIHMS305267

The publisher's final edited version of this article is available at Genet Epidemiol

See other articles in PMC that cite the published article.

Meta-analysis of genome-wide association studies involves testing single nucleotide polymorphisms (SNPs) using summary statistics that are weighted sums of site-specific score or Wald statistics. This approach avoids having to pool individual-level data. We describe the weights that maximize the power of the summary statistics. For small effect-sizes, any choice of weights yields summary Wald and score statistics with the same power, and the optimal weights are proportional to the square roots of the sites' Fisher information for the SNP's regression coefficient. When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its standard deviation. We give simple approximations to the optimal weights for various phenotypes, and show that weights proportional to the square roots of study sizes are suboptimal for data from case-control studies with varying case-control ratios, for quantitative trait data when the trait variance differs across sites, for count data when the site-specific mean counts differ, and for survival data with different proportions of failing subjects. Simulations suggest that weights that accommodate inter-site variation in imputation error give little power gain compared to those obtained ignoring imputation uncertainties. We note advantages to combining site-specific score statistics, and we show how they can be used to assess effect-size heterogeneity across sites. The utility of the summary score statistic is illustrated by application to a meta-analysis of schizophrenia data in which only site-specific p-values and directions of association are available.

Combining data from multiple genome-wide association studies (GWAS) of a common outcome has emerged as a major tool for identifying susceptibility loci for human disease and other conditions [Scott et al., 2007; Zeggini et al., 2007; Shi et al., 2009]. The goals are to discover new variants missed by the individual studies, to identify variants for outcomes not considered when the data were collected (e.g. cancer survival), and, for associated variants, to assess effect-size and its possible variation across sites. A challenge to achieving these goals is the need to assess and accommodate inter-site differences in study characteristics, such as demographic attributes of the populations studied, phenotypic aspects such as disease severity and extent of censoring of survival data, the SNPs included in the genotyping platforms used, and the choice and coding of covariates in need of adjustment.

The pooled analysis of individual-level data on phenotypes, genotypes and covariates has several advantages. Quality control can be implemented uniformly for all sites, and SNP genotypes can be imputed using data from all sites. Covariates common to multiple studies can be coded uniformly and their regression coefficients can be estimated with all available data. Optimal likelihood-based methods can be used to test whether SNP regression coefficients are nonzero and whether they vary across study sites. Offsetting these advantages is the labor involved in assembling the raw data from each site and coding common covariates, and privacy issues which could limit the sharing of genotypes.

An alternative to pooled analysis is the meta-analysis of site-specific test statistics, each having approximately a standard normal distribution under the null hypothesis of no association. Here each site types or imputes genotypes for a common set of SNPs, and then calculates a covariate-adjusted statistic for each SNP. The site-specific statistics are either Wald statistics (ratios of regression coefficient estimates to their estimated standard deviations (SDs), or score statistics (ratios of efficient scores to estimates of their null SDs) [Soranzo et al., 2009; Tanaka et al., 2009]. The coordinating center then computes a weighted sum of these statistics as a summary test statistic for the SNP [Cantor et al., 2010]. For reviews of issues in GWAS meta-analysis, see [Ioannidis 2007; Ioannidis et al., 2007; de Bakker et al., 2008; Guan and Stephen, 2008; Zeggini and Ioannidis, 2009; Cantor et al., 2010].

When SNP effect size is constant across sites, the optimal summary Wald statistic is the well-known inverse-variance-weighted combination of estimated regression coefficients, divided by its estimated standard deviation [Cohran 1954]. However there are practical advantages to combining score statistics rather than Wald statistics for GWAS meta-analyses. First, the score statistics do not require iterative parameter estimates for each SNP to be tested. Estimates for the site-specific covariate effects need be calculated just once under the global null hypothesis of no effect for any SNP; thus score statistics provide fast assessment of the statistical significance of large numbers of SNPs. Second, a SNP score statistic can be computed even when the site has provided only the SNP's p-value and direction of association. Third, the summary score statistic may be easier to interpret when the sites differ with respect to the covariates included in their regression models or with respect to their phenotype definition, measurement or coding. For example, a GWAS of nicotine addiction among current and former smokers might include sites whose outcome is a count of smoking cessation attempts, and other sites that simply classified subjects according to presence or absence of a successful cessation attempt. For a meta-analysis such as this, a combination of site-specific score statistics may be easier to interpret than a combination of site-specific Wald statistics, since the SNP regression coefficients in the latter are not comparable. Finally, site-specific score statistics have straightforward extensions to accommodate uncertainty due to imputation of unobserved genotypes, and we shall show that they can be used to assess heterogeneity of effect-size across sites.

Here we focus on optimal weights for combining site-specific statistics to identify new variants and, for associated variants, to assess possible effect-size variation across sites. Asymptotically optimal weights for combining Wald statistics have been known for more than half a century [Cochran 1954; DerSimonian and Laird, 1986]. Surprisingly however, despite their practical advantages, optimal weights for combining score statistics are not well known. In fact, some investigators suggest or use weights proportional to the square roots of the study-specific sample sizes [Soranzo et al., 2009; Willer et al., 2010; Hu et al., 2011]. We show that this strategy can be suboptimal when combining data from sites whose phenotypes have different null distributions. Instead, for the small effect-sizes expected of GWAS meta-analysis, the optimal weights for combining score statistics are essentially the same as those used to combine Wald statistics. We provide explicit forms for the optimal weights for a general class of phenotypes that includes binary outcomes, quantitative traits, counts of events and censored survival outcomes.

In the following Methods section we begin by reviewing the score and Wald statistics for testing the null hypothesis of no SNP-phenotype association at a single site, with application to case-control studies, quantitative traits, count data and censored survival data. We also show how the site-specific score statistics can be extended to handle imputation uncertainties, as noted by Marchini et al., [2007]. Then, for each of these phenotypes, we describe the optimal weights for combining site-specific score or Wald statistics. We also extend the weights to handle SNP effect-sizes that vary across sites, and we show how the site-specific score statistics can be used to assess inter-site heterogeneity. The Methods section is followed by an evaluation of power for various summary score and Wald statistics, based on simulated case-control and censored survival data. An application to schizophrenia data shows that for binary outcomes, the score statistics can be used to assess inter-site effect-size heterogeneity even when only site-specific SNP p-values and direction of association are available. The final section concludes with a brief discussion.

For simplicity we model the effects of each SNP as additive on an appropriate scale (the so-called trend or gene dosage model), though other genotype models can be used. Let (*y, g, x*) = {(*y _{j}, x_{j}, g_{j}*),

We wish to test the null hypothesis *H*_{0}: *θ* = *θ*_{0} = (*α*, 0). To do so, we introduce the Fisher information for *θ*, given by

(1)

Here *i _{αα}* is the

(2)

evaluated at ((0), 0).

The score and Wald statistics are

(3)

where *î _{β}* is a consistent estimate of

We now specialize the score and Wald statistics to specific phenotypes of relevance to GWAS. We first consider phenotypes whose relation to genotypes and covariates can be described by generalized linear models (GLMs) [McCullagh and Nelder, 1989]. Special cases of particular relevance are the binary phenotypes of case-control studies, quantitative traits such as height or weight, and counts of events such as attempts to quit smoking, and censored survival data.

GLMs assume that the mean but not the dispersion parameter of the phenotype distribution depends on genotypes and covariates, so that their effects can be modeled as a function of a linear predictor *η _{j}* =

(4)

where *a* (·), *b* (·) and *c* (·) are known functions, and *ϕ* is the dispersion parameter of the model. The score for *β* corresponding to (4) is

(5)

where * _{j}* is the fitted mean of

(6)

In the absence of covariates *x _{j}* = 1,

(7)

where *p* = 1 − *q* is the SNP minor allele frequency (MAF). Using (7) in (2), we find that the null Fisher information for *β* is

(8)

For binary phenotypes *y _{j}* is coded as one if subject

Substituting these expressions in (8) gives

(9)

For quantitative traits whose distribution is (possibly after transformation) Gaussian with mean *μ _{j}* =

(10)

where is a consistent estimate of the phenotype variance.

For Poisson count data, *a* (*ϕ*) = 1, *b* (*η _{j}*) =

(11)

Here the data are
where for subject *j, t _{j}* is survival time and

(12)

In the absence of covariates, the null Fisher information for *β*, as given by (2) and (1), is

(13)

Meta-analysis of site-specific statistics requires local imputation at the study sites. This involves calculating posterior probabilities *P _{jg}* for each subject's unobserved SNP minor allele count

(14)

Here *g* denotes a vector of genotypes for the *n _{k}* subjects, and the posterior probabilities Pr (

(15)

where
is the complete data score and the expectation is taken with respect to the posterior distribution Pr (*g**Obs*) for the SNP of interest. The null variance of *U _{β}* is given by (2) and (1), but with the Fisher information for the observed data reduced by a penalty due to the uncertainty in genotypes:

(16)

where *J ^{c}* is the negative of the second derivative matrix for the complete data. The score statistic is

The score *U _{β}* of (5) and (12) for GLMs and survival data depends linearly on the genotypes

Suppose now that each of *K* sites has computed a score or Wald statistic using a regression model that may include an intercept and/or covariates. We wish to combine these site-specific statistics into a single summary statistic to assess the overall evidence for or against association between a SNP and a given phenotype. We assume that subjects participate in only one study (see [Lin and Sullivan, 2009] for methods applicable to studies with overlapping study subjects). We first describe the weights optimal under the assumption that the SNP effect-size *β* is common across sites (homogeneous effect-size), and then consider the weights optimal when the effect-sizes *β*_{1}, …, *β _{K}* vary across sites (heterogeneous effect-size).

Here the null hypothesis to be tested is *H*_{0}: *θ* = *θ*_{0} = (*α*_{1}, …, *α _{K}*, 0), where

(17)

They are asymptotically equivalent for testing *θ* = *θ*_{0} under the null hypothesis and under local alternatives. Replacing the the terms *Z _{k:score}* with the score statistics of (3) in the first equality of (17) shows that the summary

For case-control data, *Z _{score}* is the Mantel-Haenzel test for trend in a 2 ×

(18)

The term “inverse-variance-weighted” refers to the fact is a weighted sum of the site-specific regression coefficients * _{k}*, weighted in inverse proportion to their estimated variances:

Table 1 gives the optimal weights for case-control data, quantitative trait data, Poisson count data and censored survival data, when the participating sites do not include covariates in the model, and do not adjust for imputation. The weight *w _{k}* for a given site and phenotype is proportional to the square root of the corresponding estimated null information

In the presence of site-specific adjustment for covariates and/or imputation uncertainty, the optimal weights are more complex than those shown in Table 1. According to (17), they are proportional to the square roots of the null SNP information *î _{kβ}*, which depend on the covariate distributions, covariate effects, and imputation uncertainty specific to each site. In principal, the values

Site-specific effect-size variation is plausible for several reasons. First, the studies may have used different genotyping platforms with different numbers and locations of SNPs. This results in differences across sites in levels of genotype measurement error, creating different levels of regression coefficient attenuation across sites. Second, when covariates or their coding differ across sites, different levels of control for confounding can affect SNP effect-sizes. For example, genetic ancestry may be associated with both phenotype and genotypes, and variable control for ancestry may induce differential levels of residual confounding. Third, for survival data, site-specific variation may occur if effect-size changes over time and the studies differ in level and type of censoring. Finally, the SNP of interest may interact with other risk factors whose prevalences vary across sites, which would cause variation in effect-size.

The site-specific Wald statistics have the form *Z _{k:Wald}* =

For either the score-based or Wald-based estimates _{1}, …, * _{K}*, Cochran's test is based on the statistic

where
, *r* = 1, 2. The overall effect-size is estimated as

Corresponding summary statistics for testing *β*_{1} = … = *β _{K}* = 0 are

(19)

and *Z _{k}* denotes either

In conclusion, one can choose between four types of summary statistics, each formed as a weighted sum of site-specific statistics. The four summary statistics differ according to the site-specific statistics (score *vs* Wald) and whether the weights are chosen to be optimal under effect-size homogeneity (*Z _{HM}*) or heterogeneity (

We simulated case-control and survival data to evaluate the performances of the optimal summary score and Wald statistics, and to compare them to that of the size-weighted summary *Z _{SZ}*, obtained using weights proportional to the square roots of study sizes. For each replication, we generated data assuming both homogeneous and heterogeneous effect-sizes, with and without imputation error, and then computed the various summary statistics. We flagged those exceeding certain thresholds, and then computed for each statistic the proportion of all replications in which it was flagged.

To evaluate type 1 error, we compared nominal to empirical type-1 error rates, defined as the proportion of replications in which the summary test statistic exceeded the standard Gaussian threshold for the nominal two-sided p-value. We considered two null hypotheses. The first *homogeneity* null hypothesis specifies that the true values of the *β _{k}* satisfy

We simulated the meta-analysis of *K* = 3 case-control studies having the sample sizes shown in Table 2. We generated genotypes for a single SNP assuming Hardy-Weinberg allele frequencies, and then used a logistic regression model with study-specific intercepts (*α*_{1}, *α*_{2}, *α*_{3}) = (−3.0, −2.5, −2.5) and one of three effect-size triples (*β*_{1}, *β*_{2}, *β*_{3}) to generate case and control phenotypes until we had achieved the targeted numbers of cases and controls. Two triples correspond to effect-size homogeneity, as given by *β*_{1} = *β*_{2} = *β*_{3} = ln (1.3) = 0.26, and *β*_{1} = *β*_{2} = *β*_{3} = ln (1.4) = 0.34. The third triple (*β*_{1}, *β*_{2}, *β*_{3}) = (0, 0.34, 0.26) corresponds to effect-size heterogeneity across the three sites. Power was taken as the proportion of the 1000 replications in which the test statistic exceeded the threshold for a p-value of 10^{−8}.

Table 2 shows power for summary score and Wald statistics with weights determined by assumptions of homogeneous (*Z _{HM}*) and heterogeneous (

We also generated censored survival data for a total of 2,483 subjects of European ancestry from Heidelberg Germany, Ontario Canada, and Seattle US who have been genotyped for 600,000 SNPs using the Illumina 610Quad chip as part of an ongoing GWAS of early-onset breast cancer. To mimic a meta-analysis based on *K* = 6 study sites, we split the subjects from each of the three regions into two subsets, which yielded six sites with the sample sizes shown in Table 3. We focused on 1000 SNPs in a 6-MB region of chromosome 2 (positions 1,000,713 to 6,488,209). To generate subjects' survival times, we assumed independent exponentially distributed times to failure and censoring with site-specific baseline and censoring hazards *λ _{k}, μ_{k}, k* = 1, …, 6, independent of nongenetic covariates. These values are shown in Table 3. Subjects who had not failed by time

Table 4 shows the power of homogeneity-weighted, heterogeneity-weighted and size-weighted summary score and Wald statistics under these generating models. The same patterns found for the case-control data in Table 2 are evident here: the homogeneity weighted summary statistics are consistently most powerful, and when the data were generated with homogeneous SNP effects, the size-weighted statistic is least powerful. In the presence of heterogeneous SNP effects, however, the heterogeneity-weighted test statistic is least powerful, most likely because of power loss associated with error in estimating the variance parameter *τ*^{2}.

The poor power of the heterogeneity-weighted test statistics mitigates against their use in GWAS meta-analysis. In general, these statistics are used to guard against false positives when the null hypothesis specifies varying site-specific effect-sizes whose overall mean is zero. However the primary null hypothesis in the GWAS setting specifies zero SNP effect-size at all *K* sites. This suggests the strategy of testing all SNPs using a homogeneity-weighted summary statistic, and investigating heterogeneity for those SNPs whose p-values suggest departures from this primary null hypothesis.

Finally, we evaluated how imputed genotypes affect the power of the homogeneity-weighted score-based test by masking and imputing some sites' genotypes in regions containing SNPs rs17360123 and rs4072375. These two SNPs were assumed associated with survival with a common effect-size *β* = 0.35 across the six sites. We masked and imputed either: i) the associated SNPs and their nearest flanking markers (a total of three SNPs imputed); or ii) the associated SNPs and their nearest two flanking markers (a total of five SNPs imputed). SNPs were masked and imputed for: a) one study (study 1); b) three studies (studies 1,3,5); and c) five studies (studies 1-5). We conducted the imputations using the HapMap2 reference panel in an eight megabase region containing 9262 SNPs. The five studies with masked SNPs had estimated correlation coefficients between imputed and actual rs17360123 minor allele counts of (_{1}, …, _{5}) = (0.62, 0.66, 0.57, 0.56, 0.60) when three SNPs were imputed, and (0.59, 0.58, 0.51, 0.52, 0.49) when five SNPs were imputed. The corresponding values for rs4072375 were (0.82, 0.77, 0.57, 0.62, 0.68) and (0.54, 0, 58, 0, 49, 0.46, 0.45). These estimates were obtained as described by Li and Abecasis [2006]. In total, we considered the 12 data sets described in Table 5. We imputed all masked SNPs using the Caucasian data from the HapMap2 reference panel and the imputation software BEAGLE [Browning BL and Browning SR, 2009] (http://faculty.washington.edu/browning/beagle/beagle.html). We evaluated the power of three statistics adapted from the homogeneity-weighted summary score statistic *Z _{HM}*. The first is the likelihood-based score statistic given by equations (15, 16) that accounts for imputation uncertainty. The second statistic replaces subjects' unobserved SNP minor allele counts by their posterior means, ignoring imputation uncertainties. The third statistic also is based on subjects posterior mean minor allele counts, but with weights proportional to
, a strategy similar to that proposed by Zaitlen and Eskin [2010].

Table 5 shows power for each of these three summary statistics as applied to each of the 12 data sets. For comparison, the power for the optimal tests with completely observed data was 0.827 for SNP rs17360123 and 0.954 for SNP rs4012375 (Table 4). Several results are noteworthy. First, as expected, the power of all tests dropped increasingly with the masking of increasingly many SNPs at increasingly many sites. Moreover the rs4012375 SNP suffers more power loss from masking five rather than three SNPs than does rs17360123. This difference mirrors the correspondingly larger loss of imputation accuracy for rs4012375 than rs17360123, as measured by their estimated correlation coefficients, described in the preceding paragraph. Second, substituting posterior means performed as well as the likelihood-based score statistics. In fact, substituting the posterior means for the unobserved genotypes gives exactly the observed data score
(15), since the complete data score is a linear function of the unobserved genotypes. Moreover the Fisher information computed by substituting posterior means for unobserved genotypes is an approximation to the Fisher information
for the observed data, given by (16), and it accounts for genotype uncertainty by its reduced inter-subject variation for poorly imputed SNPs. So it is not surprising that these two statistics show similar power. Third, the statistic formed by weighting by
did not perform as well, most likely due to inaccuracies in the estimates * _{k}* and sensitivity of the weights to such error. These results contrast with those of Zaitlen and Eskin [2010] who found slightly increased power using this strategy; further investigation of this matter would be useful.

We illustrate the summary score-statistics by application to the meta-analysis of three case-control GWAS studies of schizophrenia in European-Americans described by Shi et al., [2009]. The three studies are the SGENE Consortium [Stefansson et al., 2009], The International Schizophrenia Consortium [ISC; International Schizophrenia Consortium, 2009] and The Molecular Genetics of Schizophrenia study, [MGS: Shi et al., 2009]. These studies enlisted a total of 8,008 cases and 19,077 controls, as shown in Table 6. For the meta-analysis, only site-specific p-values and the directions of association for the logistic trend test were available for each of a common set of SNPs in six regions containing SNPs satisfying a predetermined threshold for statistical significance in at least one of the three studies. For the meta-analysis the authors converted each p-value to a score statistic determined as the quantile corresponding to the appropriate tail of the standard Gaussian distribution. They then evaluated SNPs using the summary score statistic of (17), with weights proportional to the square roots of the estimates *î _{kβ}* of (9). GWAS tag SNPs are selected to have common frequencies in different Caucasian populations; thus the authors assumed that the SNP MAFs are approximately the same across the three sites, which gives weights

We computed three test statistics for each of the 28 SNPs in the major histocompatibility complex class I and II identified by the authors as significantly associated with schizophrenia risk. These test statistics are: 1) the homogeneity-weighted statistic (17) used by Shi et al., [2009]; 2) the heterogeneity-weighted statistic (19); and 3) the size-weighted statistic with weights proportional to (*n*_{k0} + *n _{k}*

p-values of size-weighted statistics (y-axis) versus summary score statistics with homogeneity weights(x-axis) for 28 top schizophrenia SNPs

This example illustrates three points. First, site-specific data can be combined optimally simply by sharing p-values, the directions of association, and the null information estimates (as obtained from case and control sample sizes). Second, heterogeneity across studies can be evaluated even with such minimal information from the sites. Finally, as expected from the theory and simulations, the optimally-weighted summary score-statistic is more powerful than the size-weighted score-statistic. For case-control data such as these, optimal weighting achieves the greatest power gain relative to size-based weighting when case and control numbers are roughly the same for some studies, but substantially different for others (as in the present example). Power gains also can be substantial for meta-analysis of censored survival data when censoring levels differ among the studies.

We have reviewed SNP association tests based on weighted sums of site-specific score and Wald statistics. We have shown that when SNP effect-size *β* is constant across sites, the optimal weights for both statistics are approximately proportional to the square roots of the sites' null Fisher information for the SNP of interest, and we have described these weights for the phenotypes commonly studied with GWAS. These optimal weights have simple forms in the absence of site-specific differences in covariates and their effects, and in the absence of accommodation for site-specific differences in imputation uncertainty. Their optimality in these circumstances is supported by simulation results. In the presence of covariate adjustment, the simple weights described here should be nearly optimal unless the sites differ strongly in choice of covariates or their effect-sizes. The simulations also suggest little power loss in using weights that ignore imputation uncertainties. This finding is consistent with the similarities in the two statistics noted in the Methods section.

We also have shown how to use the site-specific score statistics to obtain SNP effect-size estimates for assessing inter-site effect-size heterogeneity. An application to schizophrenia data shows that for case-control GWAS meta-analysis, such heterogeneity can be assessed even when only site-specific SNP p-values and directions of association are available. Both simulations and the data application support the conclusion of others that Cochran's test has poor power to detect effect-size heterogeneity, particularly when the number of sites is small [Clayton and Hills, 1993]. Thus we suggest performing GWAS meta-analysis using the homogeneity-weighted summary statistics and examining heterogeneity for the hit SNPs, with the caveat that lack of evidence for heterogeneity can be interpreted only as ruling out very large differences across sites.

A limitation of all meta-analysis is its need for site-specific estimates of regression coefficients for covariates that are common to all or several sites (e.g. age, sex, ancestry). The simulations of Lin and Zeng [2010a; 2010b] suggest that site-specific regression coefficient estimates for such covariates do not produce substantial bias or power loss. Although this conclusion seems plausible unless the covariates are strongly associated with both genotype and phenotype, more evidence would be useful.

This research was supported by NIH grants R01 CA094069 and U01 CA122171. The authors are grateful to Habibul Ahsan, Irene Andrulis, Jenny Claude-Chang, Mohammed Kibriya, and Kathi Malone for the use of these genotype data, and to the reviewers of an earlier version of the manuscript for their helpful suggestions.

Let *U* (*θ*_{0}) = (*U _{α}* (

where *i* (*θ*_{0}) is the Fisher information for *θ* evaluated at *θ*_{0}. Similarly, the covariance matrix of *U* (*θ*_{0}) is

These results, when used with standard normal theory, show that, conditional on *θ*_{0} = ( (0), 0), the asymptotic distribution of *U _{β}* ( (0), 0) =

- Breslow NE, Day NE. Statistical methods in cancer research, vol 1, the Analysis of Case-control Studies. Lyon: International Agency for Research on Cancer; 1980.
- Browning BL, Browning SR. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009;84:210–23. [PubMed]
- Cantor RM, Lange K, Sinsheimer JS. Prioritizing GWAS results: A review of statistical methods and recommendations for their application. Am J Hum Genet. 2010;86:6–22. [PubMed]
- Clayton D, Hills M. Statistical models in epidemiology. Oxford: Oxford University Press; 1993.
- Cochran BG. The combination of estimates from different experiments. Biometrics. 1954;10:101–29.
- Cox DR, Hinkley CV. Theoretical statistics. London: Chapman and Hall; 1979.
- de Bakker PI, Ferreira MA, Jia X, Neale BM, Raychaudhuri S, Voight BF. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008;17:R122–8. [PMC free article] [PubMed]
- DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–88. [PubMed]
- Guan Y, Stephens M. Practical issues in imputation-based association mapping. PLoS Genet. 2008;4:e1000279. [PMC free article] [PubMed]
- Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5:e1000529. [PMC free article] [PubMed]
- Hu X, Pickering E, Liu YC, Hall S, Fournier H, Katz E, Dechairo B, John S, Van Eerdewegh P, Soares H. Alzheimer's Disease Neuroimaging Initiative. Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS ONE. 2011;6:e16616. [PMC free article] [PubMed]
- International HapMap Consortium. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61. [PMC free article] [PubMed]
- International Schizophrenia Consortium. Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, Sullivan PF, Sklar P. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–52. [PubMed]
- Ioannidis JP. Non-replication and inconsistency in the genome-wide association setting. Hum Hered. 2007;64:203–13. [PubMed]
- Ioannidis JP, Patsopoulos NA, Evangelou E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS One. 2007;2:e841. [PMC free article] [PubMed]
- Li Y, Abecasis GR. Rapid haplotype reconstruction and missing genotype inference. Am J Hum Genet. 2006;S79:2290.
- Lin DY, Sullivan PF. Meta-analysis of genome-wide association studies with overlapping subjects. Am J Hum Genet. 2009;85:862–72. [PubMed]
- Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet Epidemiol. 2010a;34:60–6. [PubMed]
- Lin DY, Zeng D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika. 2010b;97:321–32. [PMC free article] [PubMed]
- Louis TA. Finding the observed information matrix when using the EM algorithm. Jr Stat Soc. 1982;B44:226–33.
- Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–13. [PubMed]
- McCullagh PJ, Nelder JA. Generalized linear models. 2nd. London: Chapman and Hall; 1989.
- Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002;70:425–34. [PubMed]
- Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006;78:629–44. [PubMed]
- Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, Duren WL, Erdos MR, Stringham HM, Chines PS, Jackson AU, Prokunina-Olsson L, Ding CJ, Swift AJ, Narisu N, Hu T, Pruim R, Xiao R, Li XY, Conneely KN, Riebow NL, Sprau AG, Tong M, White PP, Hetrick KN, Barnhart MW, Bark CW, Goldstein JL, Watkins L, Xiang F, Saramies J, Buchanan TA, Watanabe RM, Valle TT, Kinnunen L, Abecasis GR, Pugh EW, Doheny KF, Bergman RN, Tuomilehto J, Collins FS, Boehnke M. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–5. [PMC free article] [PubMed]
- Shi J, Levinson DF, Duan J, Sanders AR, Zheng Y, Pe'er I, Dudbridge F, Holmans PA, Whittemore AS, Mowry BJ, Olincy A, Amin F, Cloninger CR, Silverman JM, Buccola NG, Byerley WF, Black DW, Crowe RR, Oksenberg JR, Mirel DB, Kendler KS, Freedman R, Gejman PV. Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 2009;460:753–7. [PMC free article] [PubMed]
- Soranzo N, Rivadeneira F, Chinappen-Horsley U, Malkina I, Richards JB, Hammond N, Stolk L, Nica A, Inouye M, Hofman A, Stephens J, Wheeler E, Arp P, Gwilliam R, Jhamai PM, Potter S, Chaney A, Ghori MJ, Ravindrarajah R, Ermakov S, Estrada K, Pols HA, Williams FM, McArdle WL, van Meurs JB, Loos RJ, Dermitzakis ET, Ahmadi KR, Hart DJ, Ouwehand WH, Wareham NJ, Barroso I, Sandhu MS, Strachan DP, Livshits G, Spector TD, Uitterlinden AG, Deloukas P. Meta-analysis of genome-wide scans for human adult stature identifies novel Loci and associations with measures of skeletal frame size. PLoS Genet. 2009;5:e1000445. [PMC free article] [PubMed]
- Stefansson H, Ophoff RA, Steinberg S, Andreassen OA, Cichon S, Rujescu D, Werge T, Pietiläinen OP, Mors O, Mortensen PB, Sigurdsson E, Gustafsson O, Nyegaard M, Tuulio-Henriksson A, Ingason A, Hansen T, Suvisaari J, Lonnqvist J, Paunio T, Børglum AD, Hartmann A, Fink-Jensen A, Nordentoft M, Hougaard D, Norgaard-Pedersen B, Böttcher Y, Olesen J, Breuer R, Möller HJ, Giegling I, Rasmussen HB, Timm S, Mattheisen M, Bitter I, Réthelyi JM, Magnusdottir BB, Sigmundsson T, Olason P, Masson G, Gulcher JR, Haraldsson M, Fossdal R, Thorgeirsson TE, Thorsteinsdottir U, Ruggeri M, Tosato S, Franke B, Strengman E, Kiemeney LA, Genetic Risk and Outcome in Psychosis (GROUP) Melle I, Djurovic S, Abramova L, Kaleda V, Sanjuan J, de Frutos R, Bramon E, Vassos E, Fraser G, Ettinger U, Picchioni M, Walker N, Toulopoulou T, Need AC, Ge D, Yoon JL, Shianna KV, Freimer NB, Cantor RM, Murray R, Kong A, Golimbet V, Carracedo A, Arango C, Costas J, Jönsson EG, Terenius L, Agartz I, Petursson H, Nöthen MM, Rietschel M, Matthews PM, Muglia P, Peltonen L, St Clair D, Goldstein DB, Stefansson K, Collier DA. Common variants conferring risk of schizophrenia. Nature. 2009;460:744–7. [PMC free article] [PubMed]
- Tanaka T, Scheet P, Giusti B, Bandinelli S, Piras MG, Usala G, Lai S, Mulas A, Corsi AM, Vestrini A, Sofi F, Gori AM, Abbate R, Guralnik J, Singleton A, Abecasis GR, Schlessinger D, Uda M, Ferrucci L. Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am J Hum Genet. 2009;84:477–82. [PubMed]
- Willer C, Li Y, Abecasis G. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26:2190–1. [PMC free article] [PubMed]
- Zaitlen N, Eskin E. Imputation aware meta-analysis of genome-wide association studies. Genet Epid. 2010;34:537–42. [PMC free article] [PubMed]
- Zeggini E, Ioannidis JP. Meta-analysis in genome-wide association studies. Pharmacogenomics. 2009;10:191–201. [PMC free article] [PubMed]
- Zeggini E, Weedon MN, Lindgren CM, Frayling TM, Elliott KS, Lango H, Timpson NJ, Perry JR, Rayner NW, Freathy RM, Barrett JC, Shields B, Morris AP, Ellard S, Groves CJ, Harries LW, Marchini JL, Owen KR, Knight B, Cardon LR, Walker M, Hitman GA, Morris AD, Doney AS, Wellcome Trust Case Control Consortium (WTCCC) McCarthy MI, Hattersley AT. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316:1336–41. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |