We found that applying a polygenic model to an increasing number of SNPs from one GWAS to another—within and across prostate and breast cancer—seemed to explain an increasing proportion of heritability, but this was quite low. Nevertheless, there is a growing appreciation that such common complex diseases may arise from a large number of genetic and environmental risk factors (Purcell et al., 2009
; Yang et al., 2010
). Applying polygenic models to genome-wide data can help explain a larger proportion of the heritability than simply focusing on the handful of most statistically significant results.
The strongest common polygenic model resulted from applying the breast cancer log odds ratios as weights to the non-aggressive prostate cancer genotypes. A slightly weaker shared model was observed when reversing this, and applying the nonaggressive prostate cancer log odds ratios to the breast cancer genotypes. The differences here may reflect the larger sample size in the breast cancer GWAS, which would allow for more accurate estimation of the log odds ratios. These cancers have biological similarities and common factors that may control hormone-dependent and -independent tumor development; in particular, there exist similarities in the key hormone signaling pathways (e.g., steroid biosynthesis) across these cancers (Risbridger et al., 2010
). Why there is only a relationship between nonaggressive prostate cancer and breast cancer remains unclear. One possibility is that there exists a similar hormonal mechanism underlying the development of these cancers, but a distinct mechanism for disease progression. Another is that the CGEMs nonaggressive prostate and breast and cancer samples might be more similar because the latter were not selected based on phenotypic characteristics.
Our findings were only slightly weakened when we removed the most statistically significant associations from the model, restricting the p
-value range to [0.01, 0.2] or [0.05, 0.2]. This suggests that the results are not entirely driven by the strongest associations, and that variants initially deemed “nonsignificant” in the GWAS of prostate and breast cancer may still help explain some of the heritability of these diseases. Moreover, the results were little changed when using different linkage disequilibrium filters to remove variants that are correlated and thus may reflect the same association with disease. In particular, when using a more conservative LD filter of r2
0.25, more SNPs were removed from consideration leading to slightly weaker results. And when there was no LD filter the findings were stronger than reported here. Note that SNPs that exhibit even lower LD (e.g., r2
0.1) with a limited number of causal variants could explain some of the findings observed here; further work will explore this possibility.
Although the proportion of heritability explained increased with larger numbers of variants in our polygenic model, the overall heritability remained quite low. This may reflect the reduced power due to limited sample sizes in the initial stages of the CGEMs GWAS considered here (Hunter et al., 2007
; Yeager et al., 2007
). Larger sample sizes may allow for more accurate estimation of the log odds ratio weights and for detecting more statistically significant results from the polygenic model. On a related topic, Nature Genetics
is now requiring that power calculations be included in manuscripts presenting results from association studies (Anonymous, 2010
). Calculating power post hoc
is a bit nonsensical—because one has already completed the GWAS—and subject to much debate in the statistical literature (Hoenig and Heisey, 2001
Although we have focused on common SNPs from GWAS, polygenic models can also incorporate less common variants and additional sources of genomic variation [e.g., copy number variants (CNVs)]. Continued scientific and technological advances will allow investigators to study less common and different sources of genetic variation. Results from the 1,000 Genomes project (www.1000genomes.org
) can be leveraged to assay less common SNPs. Moreover, sequencing technologies are rapidly decreasing in costs, and eventually genome-wide sequence studies will become feasible and provide an unprecedented opportunity to investigate polygenic models for disease.
In contrast with the polygenic model considered here, the conventional approach to GWAS entails evaluating each genetic variant one at a time, and then attempting to replicate only those most strongly associated with disease. In light of the enormous number of tests undertaken with GWAS, a very small alpha-level is generally used to determine “statistical significance” (e.g., p
). Although adhering to such strict “significance” cut points helps address issues of multiple comparisons, they are somewhat arbitrary and do not reflect the potential clinical or biological importance of an association (Witte et al., 1996
). Moreover, as shown here and elsewhere (Purcell et al., 2009
; Yang et al., 2010
), genetic variants that do not appear strongly associated may actually contribute to the underlying genomic basis of disease. By taking a broad “genome-wide” view, a polygenic model may provide a more complete understanding of the genetic architecture of complex phenotypes such as prostate and breast cancer (Witte, 2010