Consider a meta-analysis of

*N *GWA studies, not necessarily typed using the same genotyping product or imputed to the same reference panel. We assume that studies have been filtered for appropriate quality control metrics to exclude poorly genotyped or imputed SNPs [

12]. For each study, the following information is required for each good quality SNP: (i) the marker identifier; (ii) the allelic effect estimate and corresponding standard error (or an allelic odds ratio and 95% confidence interval in the case of a dichotomous trait); and (iii) the allele for which the effect has been estimated and the complimentary non-reference allele. Optionally, users may provide: (i) the frequency of the reference allele and the strand to which it has been aligned, which may aid alignment of AT/GC SNPs; (ii) the sample size contributing to the effect estimate; and (iii) an indicator to identify if the SNP has been directly genotyped in the study or imputed from a reference panel.

GWAMA begins by aligning all studies to the same reference allele at each SNP. If strand information is provided, a log file records potential misalignments and any corrections made based on the provision of reference alleles. Fixed effects meta-analysis is then performed for each SNP by combining allelic effects weighted by the inverse of their variance. The software performs tests of heterogeneity of effects across studies, and reports simple summaries of the direction of their effect in each to highlight potential inconsistencies in results. In the presence of heterogeneity of effects between studies, GWAMA can perform random-effects meta-analysis for each SNP by calculating the random-effects variance component. Graphical summaries of the results of the meta-analysis can be generated using the output of GWAMA, in conjunction with accompanying R scripts [

10], provided that a map file containing SNP identifiers, chromosome and location are specified. A dense map file is provided with the GWAMA software which includes SNPs incorporated on a wide range of GWA genotyping products and variants present on the Phase 2 HapMap reference panel [

7].

File formatting prior to meta-analysis

GWAMA is distributed with PERL scripts to format output from GWA association tools including PLINK [

13] and SNPTEST [

14]. The scripts extract the appropriate summary statistics from the output of these analysis packages, and allow subsequent filtering of results to exclude SNPs on the basis of minor allele frequency and/or number of called genotypes. However, we assume that studies have been otherwise filtered for appropriate quality control metrics to exclude poorly genotyped or imputed SNPs [

12].

Study alignment and error trapping

GWAMA initially checks input data files for errors, such as negative values for odds ratios, and reports any issues to the log file. The study is then excluded from the meta-analysis for that SNP. The reference allele reported in the first study for each SNP is taken as reference, to which all allelic effects are then aligned (Table ). If studies include estimates of the reference allele frequency, large discrepancies (more than 30%) are reported to the log file for manual checking. If strand information is not provided for studies, GWAMA assumes that alleles are aligned to the forward (+) strand of the NCBI dbSNP database. Strand misspecification is reported to the log file for all non- A/T or G/C SNPs, and are corrected before inclusion in the meta-analysis (Table ). For A/T and G/C SNPs, strand errors cannot be detected, and all studies are assumed to have provided the correct alignment. However, to overcome potential strand issues for these SNPs, it is recommended that users provide reference allele frequency estimates, so that any large discrepancies between studies can be reported for manual checking.

| **Table 1**Example of alignment of allelic effects and error trapping for a single SNP in a meta-analysis of five studies of a dichotomous phenotype. |

Fixed-effects meta-analysis

Let

*β*_{ij }denote the strand-aligned effect (log-odds ratio for a dichotomous phenotype) of the reference allele at the

*j*th SNP in the

*i*th study. The combined allelic effect across all studies at the

*j*th SNP is then given by

where

*w*_{ij }= [Var(

*β*_{ij})]

^{-1 }is the inverse of the variance of the estimated allelic effect in the

*i*th study, obtained from the standard error (or 95% confidence interval of the odds ratio for a dichotomous phenotype). Note that if the

*j*th SNP has not been directly genotyped or imputed as part of the

*i*th study,

*w*_{ij }= 0. The variance of the combined allelic effect across studies is given by

. Furthermore, the statistic

has an approximate χ

^{2 }distribution with one degree of freedom, and this provides the basis of a test of association of the trait with the

*j*th SNP over all studies.

Correcting for population structure

The presence of population structure in a GWA study, not taken account of in the analysis, can lead to over-dispersion of the corresponding association test statistics. One approach to combat this problem is to correct test statistics by the genomic control inflation factor. This factor is given by the median of the test statistics, divided by its expectation under the null hypothesis of no association, which is 0.456 in the context of an allelic-effect based analysis [

11]. Users have the option to correct each study for potential population structure, hence the genomic control inflation factor is calculated separately for directly genotyped and imputed SNPs, denoted

*λ*_{Di }and

*λ*_{D*i}, respectively, for the

*i*th study [

4,

15]. The variance of each SNP in the study is then inflated by the relevant genomic control inflation factor so that

, where

*K *is replaced by

*D *or

*D**, as appropriate. Furthermore, users have the option of correcting for between-study variation across the meta-analysis so that

. In this expression,

*λ *is the genomic control inflation factor over all meta-analysed association test statistics, genome-wide.

Testing for heterogeneity between studies

To test for consistency of allelic effects across studies at the same SNP, GWAMA calculates two summary statistics of heterogeneity [

16]. Cochran's statistic

provides a test of heterogeneity of allelic effects at the

*j*th SNP, and has an approximate χ

^{2 }distribution with

*N*_{j}-1 degrees of freedom under the null hypothesis of consistency where

*N*_{j }denotes the number of studies for which an allelic effect is reported. An alternative statistic,

, quantifies the extent of heterogeneity in allelic effects across studies, over and over that expected by chance, and is more robust than

*Q*_{j }to variability in the number of studies included in the meta-analysis [

17,

18].

Random effects meta-analysis

In the presence of heterogeneity of allelic effects between studies, it is common to perform random-effects meta-analysis in order to correct the deflation in the variance of the fixed-effects estimate [

19]. The random-effects variance component at the

*j*th SNP is given by

and is used to inflate the variance of the estimated allelic effect in each study. The combined allelic effect across all studies at the SNP is then given by

where

. The variance of the combined allelic effect across studies is given by

. As in the fixed-effects meta-analysis, the statistic

has an approximate χ

^{2 }distribution with one degree of freedom, and this provides the basis of a test of association of the trait with the

*j*th SNP, allowing for heterogeneity of allelic effects between studies.

Output and analysis summaries

For each SNP, GWAMA will output a variety of summary information and statistics: (i) reference allele to which effects have been aligned and the corresponding non-reference allele; (ii) meta-analysis allelic effect estimate and standard error (or odds ratio and 95% confidence interval for a dichotomous phenotype); (iii) meta-analysis association test statistic, and corresponding *p*-value; (iv) heterogeneity test statistics *Q *(with *p*-value) and *I*^{2}; (v) heterogeneity summary, where each study is coded as '+' for increased effect of the reference allele, '-' for decreased effect of the reference allele, '0' for no effect of the reference allele, at a pre-specified significance threshold, and '?' if the study did not report an effect for the SNP. The output from GWAMA can be used with R scripts, supplied with the software, to generate QQ and Manhattan plots to summarise the genome-wide meta-analysis.