The focus of most previous GWAS publications has been on discovery of novel biology, tested in large association studies in a “hypothesis-free” manner. Yet by virtue of their broad genomic coverage, GWASs also include loci that have been previously associated with phenotypes in candidate gene or fine-mapping association studies. We thus have the opportunity to evaluate many previously reported asthma associations in a well-characterized asthma cohort using GWAS data, assess the reproducibility of these associations, test additional common variants in these loci, and ask the question: “Are these genes truly associated with asthma?” In this work, we used GWAS data to systematically test the most promising asthma candidate genes (i.e., those previously associated in two or more cohorts). We were able to replicate findings at the SNP and gene level for multiple genes, but failed to replicate many others; both points merit discussion.
We found evidence for SNP-level replication for six genes in this cohort. All six associations are modest, with transmitted:untransmitted ratios of 1.18–1.43, leading to P
values in the 0.01–0.05 range (see
). They were appreciated here only because of a focus on candidate genes—none would have been detected in the context of typical GWAS analysis, where stringent multiple comparisons correction is required (i.e., α < 10−6
). This is the first independent replication for integrin β3
). Two previous case–control analyses in white children showed association with ITGB3
, but directionality of the minor allele differed between the two studies (18
). Our study adds support to the finding that the minor allele is actually the risk allele at this locus. In addition, the family-based nature of our cohort makes population stratification an unlikely false-positive cause of these asthma associations, thus increasing the likelihood that these associations are real. Another convincing replication is in ORMDL3
, a candidate gene discovered via a GWAS (20
). All associated SNPs identified in CAMP are in strong LD with SNPs from the initial report of association (); although a precise causative locus for asthma risk is uncertain, our results are certainly consistent with the findings of others that this region harbors an asthma susceptibility locus (20
We focused not only on SNP-level replication, but also more broadly assessed each gene and its LD-flanking region for evidence of association with asthma. We found an additional 54 SNPs in 15 genes that were significantly associated, with P < 0.05. As shown in , the SNPs associated with asthma in our study were frequently distant from those noted in original publications. Given that none of these SNPs met correction for multiple comparisons testing (P < 8 × 10−5), it is certainly possible that some of these findings represent false positives. Alternatively, these new associations may represent differences between populations in LD structure or may be due to allelic heterogeneity, with two or more causative variants resulting in similar phenotype.
Our results may be most striking for our failure to find more evidence of replication, at both the SNP and gene levels. Only a small minority of previously identified SNPs (10 of 93 SNPs either tested directly or LD-tagged) could be replicated. Only 17 of the 39 genes showed any evidence of association with asthma (11 with association at the gene level only, 2 with only SNP-level replication, and 4 with both). Thus, even using the liberal standard of “any P
< 0.05” to suggest replication, we failed to find any association with asthma in the majority of previously identified genes. Why did we not find more evidence of replication? One potential answer is inadequate coverage of common variation using a GWAS genotyping platform. The Illumina 550K array tagged 70% or more of HapMap SNPs with r2
greater than 0.8 for 29 of the 39 genes, and SNP-level coverage would not have improved using an alternative GWAS platform (the Affymetrix genome-wide human SNP array 6.0). However, these estimates are derived with HapMap data (from which LD data were used to inform SNP content on SNP arrays) and may thus overestimate genetic coverage on current GWAS microarray platforms. More recent estimates suggest substantially lower coverage, approaching 50% (24
). This sparseness would diminish power (and thus sensitivity) to detect true associations.
Poor coverage is clearly not the answer for the 93 SNPs that were directly tagged or tested—even in those SNPs, we found an association for only 10 SNPs. Lack of statistical power could further explain our negative associations. We studied 403 trios, a substantially larger population than the vast majority of previous publications. Thus, we had 80% power to detect an OR of 1.4–1.7 in these genes. It is certainly possible that we missed smaller effects, however; genetic studies suffer a well-described “winner's curse” phenomenon, with initial publications tending to overestimate effect estimates (10
). Genes were more likely to replicate if they were larger and more SNPs were tested; in both analyses, SNPs with higher MAF were more likely to be positive. These findings suggest that higher power would have increased our significant findings. Thus, for the 24 genes without evidence of replication, it may be that their true effect size is 1.3 or less, and thus even larger sample sizes are needed to identify a true association.
Another potential cause of failure to replicate involves heterogeneity between studies. Definitions of asthma vary widely. Although we used an extraordinarily well-phenotyped cohort of children with documented doctor-diagnosed asthma and a positive methacholine challenge test, and who were participating in a clinical trial, many studies rely on self-reported asthma or wheezing. Heterogeneity in the age and race of subjects could reduce power as well. We note that although we ran our analyses in an additive model, we also assessed for association in a recessive or dominant model if these were supported in the literature. In no case did we find additional significant SNPs; thus, model misspecification does not contribute to our negative results. Finally, we note that evidence of replication at either the SNP or gene level was observed for five of six genes identified by position-based genetic mapping approaches (i.e., linkage analysis or GWAS, including GPR154, ORMDL3, CHI3L1, PHF11, and DPP10, but not ADAM33). This compares with replication for less than 50% of primarily biological candidates. It is interesting to speculate that susceptibility variants identified by hypothesis-free gene mapping more consistently contribute to disease liability across populations.
So what does GWAS add to our understanding of candidate genes? It is most obviously useful when previously identified SNPs are directly represented (the 93 SNPs testable here, for example). As the number of available asthma GWASs increases, we may soon be able to pool results from multiple cohorts and thus have sufficient power to more definitively answer whether those SNPs are truly asthma susceptibility loci. GWAS also facilitates broader surveys of common variation within these candidate genes and can reveal novel candidate susceptibility SNPs, as illustrated in 15 genes in this study. However, where genetic coverage is sparse, GWAS is less well suited for “ruling out” candidate genes. In these instances, negative studies not only require larger samples, but also direct genotyping of candidate variants not represented (either directly or indirectly) on the arrays.
In summary, we have performed the first systematic assessment of asthma genes by GWAS technology. We found evidence of SNP-level replication in 6 genes, and gene-level replication for 15. We anticipate that GWAS data will continue to be used to evaluate candidate genes. As results from additional asthma GWASs become available and investigators pool their results, we will be able to more definitively resolve which of these candidates are truly asthma genes.