The findings of medical research are often met with considerable scepticism, even when they have apparently come from studies with sound methodologies that have been subjected to appropriate statistical analysis. This is perhaps particularly the case with respect to epidemiological findings that suggest that some aspect of everyday life is bad for people. Indeed, one recent popular history, the medical journalist James Le Fanu's The Rise and Fall of Modern Medicine, went so far as to suggest that the solution to medicine's ills would be the closure of all departments of epidemiology.1
One contributory factor is that the medical literature shows a strong tendency to accentuate the positive; positive outcomes are more likely to be reported than null results.2–4 By this means alone a host of purely chance findings will be published, as by conventional reasoning examining 20 associations will produce one result that is “significant at P=0.05” by chance alone. If only positive findings are published then they may be mistakenly considered to be of importance rather than being the necessary chance results produced by the application of criteria for meaningfulness based on statistical significance. As many studies contain long questionnaires collecting information on hundreds of variables, and measure a wide range of potential outcomes, several false positive findings are virtually guaranteed. The high volume and often contradictory nature5 of medical research findings, however, is not only because of publication bias. A more fundamental problem is the widespread misunderstanding of the nature of statistical significance.
- P values, or significance levels, measure the strength of the evidence against the null hypothesis; the smaller the P value, the stronger the evidence against the null hypothesis
- An arbitrary division of results, into “significant” or “non-significant” according to the P value, was not the intention of the founders of statistical inference
- A P value of 0.05 need not provide strong evidence against the null hypothesis, but it is reasonable to say that P<0.001 does. In the results sections of papers the precise P value should be presented, without reference to arbitrary thresholds
- Results of medical research should not be reported as “significant” or “non-significant” but should be interpreted in the context of the type of study and other available evidence. Bias or confounding should always be considered for findings with low P values
- To stop the discrediting of medical research by chance findings we need more powerful studies
In this paper we consider how the practice of significance testing emerged; an arbitrary division of results as “significant” or “non-significant” (according to the commonly used threshold of P=0.05) was not the intention of the founders of statistical inference. P values need to be much smaller than 0.05 before they can be considered to provide strong evidence against the null hypothesis; this implies that more powerful studies are needed. Reporting of medical research should continue to move from the idea that results are significant or non-significant to the interpretation of findings in the context of the type of study and other available evidence. Editors of medical journals are in an excellent position to encourage such changes, and we conclude with proposed guidelines for reporting and interpretation.