One should be cautious about taking our estimates of causal effects too literally. They are dependent on prior distributions, and have wide credible intervals. Although we have shown that our procedure could infer a rare variant if one were present, point estimates of its allele frequency and relative risk are heavily biased. Several groups are currently engaged in fine mapping and resequencing efforts in the regions studied, which will lead to more direct estimates of causal effect sizes. Thus the quantitative estimates presented here will eventually be redundant, although it will be interesting to compare our estimates to the actual causal effects when known.
Instead we emphasise the qualitative nature of our results, which indicate that most, if not all, associations with breast cancer so far identified by GWAS are likely to be markers for common causal variants with modest effects. This is consistent with the CDCV hypothesis that originally motivated GWAS, but not with recent suggestions that many GWAS hits could be markers for rare causal variants(10
). In this respect our results agree with other recent work in support of the CDCV hypothesis. Anderson et al(11
) argued that GWAS has low power to find a rare variant that had not already been detected by linkage, and noted examples of resequencing projects that had not identified rare variants underlying a common GWAS hit. This includes currently unpublished work by the Wellcome Trust Case-Control Consortium in which sequencing of 16 regions identified by GWAS did not identify any underlying rare causal variant. Wray et al(12
) show that the distribution of risk allele frequencies from currently known GWAS hits is consistent with the majority of these hits arising from common variants. Iles(22
) showed that early findings of GWAS have been at loci for which the power is highest, which are indeed the common variants. Since we confined attention to SNPs identified in the first wave of breast cancer GWAS, and have subsequently been replicated, we should expect these loci to be enriched for common variants, and in this respect our results are unsurprising. But in contrast to these other studies we are able to estimate causal effects for specific loci rather than average properties of all causal variants. Our results indicate that these loci are consistent with the general pattern of common causal variation suggested by other work, and our methods can be applied to further markers that emerge from GWAS.
Our approach cannot distinguish between the effect of a single common variant and the average effect of a number of variants with a common total frequency. While such a scenario is theoretically possible for a complex disease(9
), Wray et al have argued against this scenario for the loci found to date(12
). We cannot lend support to either position here other than to note the fact that all ten SNPs indicated a common causal variant, suggesting that if rare variants do underlie these associations then they do so either in large numbers or not at all.
Several authors (13
) have used simulations to estimate the empirical conditional distribution of causal allele frequencies and relative risks, given that a marker was identified by GWAS and subsequently replicated. Our approach to modelling the LD between markers and causal variants is much simpler, but we found this model had little effect on the parameters of interest. We do not explicitly model the process of marker discovery by GWAS, and in that respect our prior is more favourable to rare variants, thus strengthening our conclusion that the causal variants are common.
The use of familial cases in association studies is motivated by the excess relative risk in the ascertained sample compared to a sample of unselected cases. We have shown however that imperfect correlation between markers and causal variants leads to an excess risk in familial cases that differs from the predicted value. The difference could be in either direction, and indeed when the causal and marker variants have similar frequency, the excess risk is higher at the marker than at the causal variant, so that the study design is even more efficient than predicted (). In our data however there was a systematic attenuation of the excess risk in bilateral cases, similar to observations for familial cases in the study of Turnbull et al(6
), which is most consistent with causal variants of higher frequency than the markers. The efficiency of bilateral sampling, while still greater than that of unselected sampling, appears to be less than predicted, and this may have implications for the design of future studies of common genetic risk factors.
Some other mechanisms can also lead to attenuation of the excess risk. We assumed a multiplicative model in which each copy of the risk allele multiplies the disease risk to the same degree, but the true model could be recessive, dominant or more general. We can rewrite equations 1.2
in terms of recessive or dominant effects: it turns out that under a recessive model the excess risk attenuates at higher causal frequencies than under the multiplicative model, whereas for a dominant model it attenuates at lower frequencies (results not shown). Dominant causal variants could therefore be more consistent with rare variation than the multiplicative model considered, but the relevant probabilities remained low when we assumed this model in our analyses, and for brevity we have omitted these results.
We have also assumed that effects act on the log-risk scale, which is convenient as additional polygenic and environment effects cancel out of relative risk calculations so we need not assume a model for them. If however the effects act on say the logistic or probit scales, then the excess relative risk would be attenuated even at the causal variant. We considered this possibility by allowing for a normally distributed polygenic random effect with mean zero and variance 2log(2), consistent with a sibling relative recurrence of 2(23
). Acting on the logistic scale this could reduce the excess relative risk for the causal variant from 2 to 1.8 in bilateral cases, but this is less than the degree of attenuation we observed in our data. Subgroup effects, such as age or tumour subtype specific risks, could also attenuate the marginal excess risk, but we did not observe any such effects in our data.
We have shown that genetic markers of breast cancer have lower excess risk in familial cases than had been predicted, leading to reduced improvements of efficiency in these study designs. However this information can be usefully exploited to estimate the relative risk and allele frequency of the underlying causal variants. Despite using a prior distribution that favours rare variation, we showed that data from bilateral and familial cases strongly imply that the causal variants underlying recent GWAS findings are common with modest effects, in line with other recent work favouring the CDCV hypothesis. We look forward to the outcome of current fine mapping projects to confirm the accuracy of these predictions.