Patterns of linkage disequilibrium (LD) in human populations are complicated, and preclude analytical results, so we adopted a simulation approach (see Methods
for details). We describe the approach informally before describing our results. First, we chose each allele at each SNP in the HapMap ENCODE regions in turn, assuming it to be causative with a given effect size. We then used a previously reported simulation scheme (HAPGEN, 
) to simulate a large population of chromosomes with European ancestry, whose patterns of LD match those in the CEU HapMap analysis panel. From this population a case-control sample is taken, with the controls sampled randomly from the population and the cases chosen by oversampling chromosomes carrying the causal allele in the appropriate way given its frequency and assumed effect size. To simulate a GWAS, we considered samples of 2000 cases and 2000 controls typed on the Affymetrix GeneChip Human Mapping 500K Array Set (see www.affymetrix.com
). No single sample size can model all reported GWAS, but this size is typical of many. (Later, when considering associated loci from specific diseases that have been studied extensively, we simulated GWAS of larger size.) To simulate a GWAS on a particular commercial chip, we examined data at only those SNPs on the chip in question and checked to see whether any of these SNPs showed a p-value for association <10−6
. If this occurred we then modelled a replication study, using an additional 2000 cases and 2000 controls for definiteness. We took the best SNP from the simulated GWAS and examined it in the simulated replication sample to check whether it had p<0.01 in this replication sample. In what follows we only considered those simulations where the best SNP on the genotyping chip met both these criteria, as these model the ascertainment implicit in reported GWAS associations. For these simulations, we estimated the effect size at this associated SNP, which we call the hit SNP
, in the replication datasets and compared it with the true effect size at the causal variant used for the simulation. The fact that we estimate the effect size from the replication data set is important, because it minimises the effect of “winner's curse”, which would otherwise lead to the effect sizes being over estimated 
. Simulated GWAS and replication samples were generated for a range of assumed true effect sizes.
Reported genome-wide association studies differ in many particular details, including the choice of genotyping chip used and the sizes of the discovery and replication samples. Specific assumptions are necessary for any simulation study, and ours aim to capture the general features of many reported studies. Investigation of different simulation scenarios, including different genotyping chips and sample sizes, did not change the broad conclusions that follow (data not shown).
Effect size estimates
To begin, we compare the estimated effect size at the replicated hit SNP with the true effect size at the causal SNP in the simulation. illustrates this comparison for three different values of the true effect size. For each we see a peak of estimates around the true effect size assumed at the causal SNP. But note also that there is often underestimation of the true effect size (mean estimated effect size 1.24, 1.86 and 3.32 for true relative risk of 1.25, 2 and 4 respectively), and that this underestimation can be substantial when the true effect is large. For example, when the true relative risk is 4, the estimated effect size was less than two in 12% of simulations of successful GWAS discovery of the effect.
Distribution of estimated effect sizes.
In we plot the relative under- (or over-) estimation of the effect size (estimated effect size divided by true effect size) as a function of the correlation (as measured by the r2
which is the square of Pearson's correlation coefficient) between the hit SNP and the true causal variant. The underestimation is seen to be due to imperfect tagging: when the true causal variant is not well tagged by SNPs on the genotyping chip (the correlation is weak), the estimated effect at the hit SNP is often much lower than the true effect. Conversely, when the causal SNP is well tagged by a SNP on the chip, the estimated effects cluster around the true effect size. Note that while underestimation decreases as the correlation between the hit SNP and the causal SNP increases, there remains systematic underestimation even when the hit SNP has r2
≈0.8 with the causative SNP. For example in one third of simulations when the true effect is two, the estimated effect will be under 1.8. Note also that when the true effect size is large, significant and replicable associations can be detected when the best tag SNP only has r2
≈0.2 with the causal variant (, relative risk
Relationship between underestimation and correlation.
Imperfect tagging and an ascertainment effect also explain the feature of the plots whereby the underestimation is much less for smaller true effect sizes. If the true effect is small and the true causal variant is not well-tagged on the genotyping chip, there will not be enough power for the GWAS and subsequent replication to reach significance 
, with the result that the corresponding simulation will not contribute to the plot. But if the true effect is large there may still be power to see a significant result when the true variant is not well tagged, so the simulation contributes to the plot and shows the underestimation. Put another way, if the true effect is small, it will only be detected in an association study if the causal SNP is well tagged, and in this case the effect size will be estimated reasonably well. This second ascertainment effect explains the lack of underestimation at hit SNPs not strongly correlated to the causal SNP in the left panel of the . Lastly, as low frequency SNPs are less well tagged by other SNPs 
, the extent of the underestimation also depends on the frequency of the risk allele (see Figure S1
). Interestingly, the effect sizes at rare alleles are underestimated to a great extent, but only when the true effect size is large enough for the tag SNP of a rare allele to be detected and replicated in the simulated GWAS.
What true effects might underlie the effects estimated from GWAS?
The results above describe the distribution of estimated effect sizes as a function of known true effect sizes and the frequency of the risk allele. In practice we are actually interested in the reverse question, namely what true effect sizes are plausible in the light of the effect size actually estimated from a GWAS and follow-up study? We will see that this requires assumptions about the true distribution of effect sizes. Indeed, writing RR for relative risk, and RAF (risk allele frequency) for the allele frequency at the risk allele, application of Bayes' theorem gives
where “true” refers to the value at the causal SNP and “observed” refers to the value at the hit SNP. Our simulation study allows us to estimate the first factor on the right hand side of (1), and we do so by discretising both the observed and true RR and RAF and creating a matrix of counts based on our simulations over the ENCODE regions. The second factor on the right hand side of (1) is the assumed joint distribution of true risk allele frequencies and effect sizes, which is of course unknown.
We proceed by making two different sets of assumptions about these unknowns. In each case we assume that the distribution of risk allele frequencies is given by the empirical distribution of allele frequencies in the ENCODE regions. In effect this assumes that any SNP variant is, a priori
, equally likely to affect disease status. What differs between the sets of assumptions is the assumed effect size of a particular variant. Our first set of assumptions posits that the distribution of effect sizes is the same for all putative causal variants, regardless of their allele frequency, and that effect sizes are close to those observed in GWAS studies. The second set of assumptions explicitly assumes that there might be substantially larger effects at variants with smaller minor allele frequency. These priors are described in detail in the Methods
Different sets of assumptions about true effect sizes and risk allele frequencies necessarily lead to different conclusions, and it is impossible to study all possibilities. A number of theoretical analyses 
have argued for a relationship between effect size, disease model, and minor allele frequency (MAF). As there is no consensus on the exact form and extent of the relationship we do not rely on them explicitly here, and instead our approach aims to capture two different perspectives on unknown effect sizes, with the subsequent analyses indicating a range of possibilities. The first perspective is that the range of true effect sizes will be close to those estimated from current GWAS. The second captures the possibility that low-frequency variants may have considerably larger effect sizes.
Under either set of assumptions, we can use our simulation study, and Bayes' Theorem (1) to estimate the conditional distribution of true effect sizes and risk allele frequency (RAF) in the light of the observed data at the GWAS hit SNP. illustrates this, showing estimates of the posterior distribution of the true effect size conditional on observing a risk estimate between 1.2 and 1.3, for different observed risk allele frequencies, and under the two different prior assumptions on effect size distributions.
Posterior distribution on true relative risk.
A common feature of the histograms in is that the mode of the posterior distribution on the true effect size is on, or very closes, to the observed estimate. That is, current estimates from GWAS studies of effect sizes from a common SNP, in the range 1.2–1.3 are most likely to be very close to truth. As expected, estimated effects within this range are more likely to be 1.3 than 1.2, because larger effects are more likely to generate a signal of association strong enough to pass the p-value thresholds commonly implemented in GWAS. This explains the left hand tail of the distributions represented in .
also shows that there is some probability that the effect size at the causal variant is greater than estimated from the most associated SNP. Interestingly, the observed risk allele frequency impacts our posterior belief about the true effect size, under either set of prior assumptions, with underestimation be more marked when the risk allele at the hit SNP is rarer. Under the conservative prior, when the risk allele at the hit SNP has less than 20% frequency in the control population, the probability that the relative risk is above 1.325 is 55%, compared to 35% when the risk allele frequency is between 20–50%. The corresponding numbers for the MAF-dependent prior are 77% and 49%. There are several different phenomena at work here. If the hit SNP is the causal SNP then, assuming that the association is strong enough to be detected and replicated in the GWAS, there is no systematic under estimation (and very little over estimation as we assume the effect size is estimated from the replication sample). However, conditional on the hit SNP not being causal, the distribution of LD with true causal SNP, and therefore the propensity for under estimation, depends on its allele frequency. The posterior distribution on the true effect size given the observed frequency and effect of the hit SNP can be viewed as a mixture of these two scenarios, weighted by their conditional probability. Rarer SNPs are less likely to be tagged well by single markers, and as noted above, poor tagging leads to underestimation of effect sizes. In contrast, for a common SNP, the associated allele is more likely to be well correlated with the causal allele, so there is relatively less under estimation. Under the MAF-dependent prior, when the associated allele is low-frequency the causative allele will tend to be low-frequency as well, and so potentially of larger effect. In the scenario where we believe in larger effects at rare causal alleles and have observed a SNP with low RAF with estimated relative risk between 1.2 and 1.3 there is a 24% chance that the source of the signal is a variant which actually doubles or more than doubles risk with each copy of the risk allele.
Our observations are similar when the observed risk allele is the most common allele in the population (RAF>50%) and therefore the minor allele is protective (Figure S2
). Qualitatively, the same conclusions also apply when the estimated effect size at the hit SNP is weaker, for example in the range 1.05 to 1.2 (Figure S3
Consequences for individual disease risk
One consequence of the potential underestimation of effect sizes from GWAS findings is that as we move to better identification of the actual causal variants, through fine mapping and/or functional studies of associated regions, our estimates of their effect sizes might well increase. Assuming a multiplicative model of risk across loci, these small expected changes could combine to increase the relative risk of disease in those individuals with highest genetic risk of disease.
To investigate this, we simulated genotypes at known associated loci in a population of individuals (assuming Hardy Weinberg equilibrium and no linkage disequilibrium across loci) for each of breast cancer, type 2 diabetes and Crohn's disease, based on reported risk allele frequencies 
(see Tables S3
for a list of loci). First treating the causal loci and relative risks for each disease as given by current GWAS estimates, we measured the average risk of individuals in the top x%, by risk, of the population (for differing values of x) and compared this to the mean risk in the population. We then repeated this simulation, allowing for the uncertainty in the estimation of true effect sizes by averaging over the uncertainty in both the RAF and effect size of the causal variant on the basis of the posterior distributions of these, given the GWAS findings, under the two priors described above. We assumed that risks combined multiplicatively across loci. For NOD2
in Crohn's disease where the causal variant is thought to be known, here and below, we used the effect sizes for the known variant, and did not average over uncertainty in these. Because all three diseases have been extensively studied, we approximated the GWAS discovery process as corresponding to a GWAS discovery sample of 5000 cases and 5000 controls, and a replication sample of 10,000 cases and controls. The actual discovery process for each of the diseases is complicated, often involving meta-analysis and/or multistage discovery, and not straightforward to model accurately, but the approach we use should capture the fact that GWAS-discovery were ascertained through study of large numbers of samples.
The results of the three simulations are given in .The unadjusted simulations give estimates of how much more at risk individuals with the greatest genetic propensity to disease are, based only on GWAS loci, relative to the average person in the population. As expected, the fold change in risk of individuals carrying a large fraction of risk variants is dependent on the number and magnitude of known loci. For example, individuals in the top 0.1% of risk for Crohn's disease are 20 times more likely than the average person to develop the condition, whereas for breast cancer, where the number of common loci and associated relative risks is typically smaller, the equivalent number is just over two-fold.
Adjusted estimates of individual risks.
The second and third simulations attempt to average over the possible outcomes of our future efforts to map causal mutations, to reveal the likely gains in our ability to stratify individuals on the basis of risk. These use the methodology above, under both prior distributions, to average over the posterior distribution of the allele frequency and effect size at the causal SNPs underlying reported GWAS loci for the three diseases. These adjusted estimates are also shown in . Across diseases we see that there is a significant increase in the risk associated with carrying multiple risk variants. In particular we see that the biggest differences in risk are for those individuals in the extreme tail. It is these individuals who carry the stronger, likely rarer, risk alleles which are currently insufficiently characterised by the most significant signal of association in some regions identified to be important in disease. For example, the risk of an individual in the top 0.1% of the population for genetic risk typed at the causal loci underlying currently known GWAS loci will likely be increased by a factor of 3–6.5, 5–12, or 25–50, compared to an average individual, for breast cancer, type 2 diabetes and Crohn's disease. These are notably greater increases in risk than current prediction based in the hit SNPs from GWAS loci which would be 2.4, 3.5 and 20 respectively.
We have shown above that as we move to identification of the true causal variants underlying GWAS associations, through fine mapping and functional studies, their effect sizes will tend to increase, in a minority of cases substantially, compared to current estimates from GWAS. This will, in turn, increase the amount of heritability explained by these diseases. We can use the approach developed here to try to quantify this effect.
We investigated this question in the context of the three diseases just described, namely breast cancer, type 2 diabetes, and Crohn's disease. For each disease we took the set of hit SNPs from published associated loci 
(see Tables S3
), and for our two prior distributions on effect sizes we estimated the posterior distribution of both the effect size and the allele frequency for the causal SNP at each locus, as described in the previous section. One commonly used measure of heritability is sibling recurrence risk ratio, often denoted by λS
: the relative increase in risk to an individual if their sibling has the disease compared to the baseline risk in the population as a whole 
. Assuming, as is usual for heritability calculations 
, that there is no interaction between loci, λS
can be calculated as a function of the risk allele frequency and effect size for each causal variant. In order to allow for the uncertainty in the allele frequency and likely underestimation of the effect size at the causal variants underlying GWAS associations, we averaged this expression over the posterior distribution of these quantities, given the GWAS findings (see Methods
The results are shown in . For each disease they show that the heritability due to already identified GWAS loci will be higher than current estimates, under either set of assumptions about true effect sizes, but particularly under the MAF-dependent
prior. Whereas at the time of writing the current estimates of the contribution to λS
from GWAS loci are 1.03, 1.08, and 1.49 for breast cancer, type 2 diabetes, and Crohn's disease, these may well be 1.06, 1.14, and 1.61 (mean under the conservative prior) and they could plausibly be as high as 1.21, 1.39 and 2.46 (mean under the MAF-dependent
prior). Whilst some of the “missing” heritability is thus disguised rather than missing, we note that this effect is unlikely to account for the extent of the gap between estimates of sibling relative risk (2, 1.8, and 10, respectively, from family studies 
) and those explained by currently known loci. We return below to a discussion of the discrepancy.
Adjusted estimates of explained heritability.