We know there is a lot of lack of replication in research findings, most notably in the field of genetic associations [1–3]. For example, a survey of 600 positive associations between gene variants and common diseases showed that out of 166 reported associations studied three or more times, only six were replicated consistently . Lack of replication results from a number of factors such as publication bias, selection bias, Type I errors, population stratification (the mixture of individuals from heterogeneous genetic backgrounds), and lack of statistical power .
In a recent article in PLoS Medicine, John Ioannidis quantified the theoretical basis for lack of replication by deriving the positive predictive value (PPV) of the truth of a research finding on the basis of a combination of factors. He showed elegantly that most claimed research findings are false . One of his findings was that the more scientific teams involved in studying the subject, the less likely the research findings from individual studies are to be true. The rapid early succession of contradictory conclusions is called the “Proteus phenomenon” . For several independent studies of equal power, Ioannidis showed that the probability of a research finding being true when one or more studies find statistically significant results declines with increasing number of studies.
As part of the scientific enterprise, we know that replication—the performance of another study statistically confirming the same hypothesis—is the cornerstone of science and replication of findings is very important before any causal inference can be drawn. While the importance of replication is also acknowledged by Ioannidis, he does not show how PPVs of research findings increase when more studies have statistically significant results. In this essay, we demonstrate the value of replication by extending Ioannidis' analyses to calculation of the PPV when multiple studies show statistically significant results.
The probability that a study yields a statistically significant result depends on the nature of the underlying relationship. The probability is 1 - ß (one minus the Type II error rate) if the relationship is true, and a (Type I error rate) when the relationship is false, i.e., there is no relationship. Similarly, the probability that r out of n studies yield statistically significant results also depends on whether the underlying relationship is true or not. Let B(p,r,n) denote the probability of obtaining at least r statistically significant results out of n independent and identical studies, with p being the probability of a statistically significant result. B(p,r,n) is calculated as
In this formula, p is 1 - ß when the underlying relationship is true and a when it is false. Let R be the pre-study odds and c be the number of relationships being probed in the field. The pre-study probability of a relationship being true is given by R/(R + 1). The expected values of the 2 × 2 table are given in Table 1. When r is equal to one, entries in Table 1 are identical to those in Table 3 of Ioannidis . The probability that, in the absence of bias, at least r out of n independent studies find statistically significant results is given by (RB(1 - ß,r,n) + B(α,r,n))/(R + 1) and the PPV when at least r studies are statistically significant is RB(1 - ß,r,n)/((RB(1 - ß,r,n) + B(α,r,n)).