In genetic association studies, the genetic effect size for associated markers tends to be overestimated as a consequence of winner's curse. This bias is due to the strong positive correlation between the association test statistic and the estimator of the genetic effect and the focus of investigators on markers that show statistically significant evidence of association. In this paper, we studied the bias of the naïve maximum likelihood estimators for the allele frequency difference and the odds ratio that ignore this ascertainment; these measures are routinely used to estimate the strength of the effect in genetic association studies. We demonstrated that the proportional bias in the estimators decreases as power increases. Interestingly, at fixed significance level, the proportional biases of the allele frequency difference and the logarithm of odds ratio are functions of power, and otherwise are essentially independent of allele frequency or sample size (see also [

Zöllner and Pritchard, 2007]).

We proposed a maximum likelihood method to correct for this ascertainment bias. The ascertainment-corrected MLEs for both the allele frequency difference and the (log) odds ratio are generally less biased than the uncorrected estimators unless study power is moderate to high (>60%). Since large-scale genetic association studies of complex traits typically are underpowered owing to small genetic effect sizes, our method should generally provide a more accurate estimate of genetic effect size in the context of genome-wide association studies and large-scale candidate gene studies. In high power situations, bias for both the naïve and corrected methods are small, so that ascertainment correction again is reasonable. Proportional bias of the corrected and uncorrected estimators for both the allele frequency difference and the odds ratio does show modest dependence on significance level α. For example, when significance level α = 10^{-4}, biases for all estimators are somewhat increased compared to the case of α = 10^{-6}, and the advantage of ascertainment correction is increased slightly.

Zöllner and Pritchard [2007] used simulations to evaluate the impact of the winner's curse effect in genetic association studies and also proposed a maximum likelihood method to correct for it. Their method estimates the frequencies of all genotypes and corresponding penetrance parameters based on a known population prevalence of the disease under different inheritance models. In contrast, our method is simpler and focuses solely on the parameters of greatest interest: the allele frequency difference and odds ratio. This advantage of our method does require the assumption of Hardy-Weinberg Equilibrium for our case and control samples. Such an assumption is entirely reasonable given the modest locus effect sizes for complex traits, but would not be reasonable in the context of a Mendelian major locus.

Our corrected MLEs for the allele frequency difference and odds ratio generally underestimate the true genetic effects [

Zöllner and Pritchard, 2007]. Using computer simulation, we note that the empirical distribution of our corrected MLEs can reasonably be described as a two-component mixture, with one component near zero and the other appearing more nearly normal. illustrates this for the ascertainment-corrected estimator of the allele frequency difference. As power increases, the distribution becomes more nearly normal, and the asymptotic unbiasedness of the MLE comes into play.

We investigated the coverage of the asymptotic theory 95% confidence interval for the naïve and ascertainment-corrected MLEs for the allele frequency difference δ. The coverage of the ascertainment-corrected interval ranged from 82-100% for the cases we considered, reflecting the distribution and the bias of the ascertainment-corrected MLE, but still generally better than the coverage for the naïve estimator, which ranged from 0-92%.

Given the usual downward bias of our ascertainment-corrected estimators, one could consider an

*ad hoc* bias correction. For the estimators of the allele frequency difference δ and the log odds ratio lnOR, the downward bias is 5-20% across the situations we considered (control allele frequency .1-.5, allele frequency difference δ=.018-.159 (OR 1.11-2.30), case and control sample sizes 250 to 2,000, and statistical significance 10

^{-4} to 10

^{-8}), so that multiplying the resulting estimate by 1.05 – 1.10 would generally reduce absolute bias. However, such an approach is counterproductive when power is very low (<.005). The same criticism holds for taking a (weighted) average of the corrected and uncorrected estimators. More appealing might be to use an alternative estimation approach, and we currently are considering an empirical Bayes method [

Carlin and Louis, 2000] that uses information from genome-wide association studies to help define a prior distribution for the genetic effect size.

Realistically, precise and unbiased estimation of genetic effect size will best be obtained by collecting a large sample specifically for this purpose, should resources be available to do so. However, given a sample in which an association is discovered, our ascertainment corrected approach provides more accurate estimation of allele frequency difference and odds ratio than the naïve approach, and permits better design of subsequent replication studies or studies focused on estimating the population effect of the identified variant(s). Standard errors for the ascertainment-corrected MLEs were substantially larger than those for the naïve estimator based on an independent random sample of the same size, correctly reflecting the information loss for estimation based on a sample used for association detection.

In summary, we have presented analytic calculations that quantify the impact of the winner's curse in large-scale genetic association studies, and confirm that in realistic situations, it can result in substantial overestimation of the true genetic effect as measured by the case-control allele frequency difference or the corresponding odds ratio. We propose a maximum likelihood estimator that corrects for the typical focus on statistically significant results, and demonstrate that this estimator results in reduced absolute bias compared to the naïve uncorrected estimator when study power is low or moderate (<60%), a range that is typical for most large-scale genetic association studies, and similar absolute bias when power is high. Our method does not require specification of a genetic model and is easy to implement. We extended these calculations to two-stage association studies, and found similar results to those for one-stage studies. We recommend the use of this ascertainment-corrected method for estimation of genetic effect size in large-scale genetic association studies.