We describe here in brief the application of meta-analysis to GWA studies of T2D [42
]. Three T2D GWA scans (WTCCC, DGI, FUSION) [43
] were combined in a meta-analysis framework. Details for the design of these studies can be found in their original publications [43
]. Genotypes at untyped SNPs were imputed across the HapMap in each of the 3 individual studies. Stringent quality control was carried out for directly typed and imputed variants, and individual studies were corrected for population stratification before combining summary results across a total of 10,128 samples. Approximately 2.2 million SNPs survived quality control. To combine the data, both fixed effects OR-based and p value-based meta-analysis approaches were used.
In combining the 3 GWA scans, several challenges were encountered and analytically overcome. The 3 studies had been carried out on different genotyping platforms. To overcome this, SNPs were imputed across the HapMap, and this allowed maximized use of the information available. The scans had different ascertainment schemes, for example in terms of matching cases and controls for BMI, a known T2D risk factor. To address this, both fixed- and random effects meta-analyses were carried out and evidence for informative heterogeneity was examined. One prime example of informative heterogeneity is the FTO
locus, the first robustly replicating obesity susceptibility locus [47
], which was associated with T2D in some, but not other scans. In particular, this locus did not achieve significant evidence for association with T2D in scans that had matched cases and controls for BMI.
Different imputation methods had been used to infer genotypes at untyped variants across the 3 studies. A stringent approach to quality control of the imputed genotypes was therefore taken. The robustness of results was ensured by directly genotyping in the original samples any SNPs that gave significant signals based on imputed genotype data.
Finally, population stratification was dealt with in multiple ways, to ensure that the rising signals were not spurious associations. Among the different approaches taken, directly typed and imputed data were adjusted for genomic control in each scan separately, and then adjusted for genomic control in the combined meta-analysis dataset.
A 3-stage approach was pursued (). In the first stage, data on the 2.2 million SNPs were combined across 10,128 samples in a meta-analysis framework. In the second stage, 69 SNPs were selected and genotyped in 22,426 additional independent samples, all of European descent. In the third stage, 11 of these promising signals were taken forward to a further independent set of samples of European descent, consisting of 57,366 T2D cases and controls.
Type 2 diabetes study design
A large excess of associated SNPs in stages 2 and 3 was observed. In stage 2, out of 65 independent signals, 10 achieved a p-value<0.05 for association with T2D, whereas only 1.6 were expected to achieve this by chance. In stage 3, out of 10 independent signals, 7 reached a p-value<0.05 with T2D, as opposed to an expected 0.25 under the null. It is noteworthy that the associated SNPs all had small observed effect sizes, with the largest allelic odds ratio being only 1.15. This underlines the need for very large-scale meta-analysis replication efforts.
summarizes the genetic loci that have substantial statistical support for association with T2D as of the summer of 2008. As shown, some of these loci were identified through non-GWA approaches. Others emerged out of single GWA studies combined with data from several replication datasets; while a number of them have emerged only recently as the result of meta-analysis of several GWAs and replication datasets, as described above. A common feature is that rarely has a single study been able to reach genome-wide significance for specific loci. This has required the combination of data from several studies, GWA scans and focused replication efforts.
Gene loci with common variants associated with type 2 diabetes and their method of identification.
Moreover, the causal variants at these loci have not yet been identified. Even very large-scale GWA meta-analyses would require extensive fine-mapping and targeted resequencing experiments before the truly causal variants can be confidently identified [5
]. Finally, evaluation of the extent of generalizability of these associations would need even further replication studies in different settings and types of participants (e.g. different racial groups, or different risk populations [48