|Home | About | Journals | Submit | Contact Us | Français|
Recently, some articles on the P-value published in biomedical journals caught my eye. These articles included the words ‘misuse,’ ‘misconception,’ or ‘misinterpretation’ of the P-value in their titles or abstracts. Nuzzo  stated that P-values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume and added that ‘the P-value was never meant to be used the way it's used today.’ Greenland et al.  provided an explanatory list of 18 misinterpretations of P-values and guidelines for improving statistical interpretation. Other interesting articles include a paper by Chavalarias et al.  and the accompanying editorial by Kyriacou  published in the Journal of the American Medical Association, and a review paper by van Rijn et al. . Not only in biomedical communities, but also in a major statistical society, a statement on P-values has been issued in order to sound the alarm about the misuse of P-values .
Having been a statistician at medical schools for more than 10 years, I understand very well how obsessed many researchers are with P-values regarding the results of their studies. I have seen many times that some researchers try to obtain a P-value less than 0.05 from a P-value slightly greater than 0.05, such as 0.053 or 0.06, by deleting or adding some data. This happens, I think, because many researchers often believe that results with a P-value<0.05, which is considered to be ‘statistically significant,’ are truly scientifically or substantially significant. However, this is one of the most notorious misinterpretations of the P-value.
With this in mind, how can we correctly interpret P-values? To answer this question, we should understand what a P-value really is. Informally, a P-value is the probability under a specified statistical model that a statistical summary of the data would be equal to or more extreme than its observed value . I admit that this definition is not easy to understand. An important point is that it should not be used as a definitive decision-making tool, yielding outcomes of yes or no. A P-value is a probability. It is a measure of summarizing the incompatibility between a particular set of data and a proposed model for the data (the null hypothesis) . Ronald Fisher, who introduced the P-value, intended it as an informal way to judge whether evidence was significant in the sense of being worthy of a second look . A very small P-value indicates that the null hypothesis is very incompatible with the data that have been collected. However, we cannot say with certainty that the null hypothesis is not true, or that the alternative hypothesis must be true .
Another important fact is that the P-value has nothing to do with the magnitude or the importance of an observed effect . I have been surprised to see that many researchers interpret a result with a risk ratio of 0.59 with a P-value of 0.16 as non-significant or ‘no difference,’ while stating that a risk ratio of 0.83 with a P-value of 0.002 is highly significant. As argued by Wasserstein and Lazar , statistical significance is not equivalent to scientific, human, or economic significance. One must not interpret the results solely by the P-value. A small P-value could be simply due to a very large sample size regardless of the effect size. A P-value>0.05 does not mean that no effect was observed, or that the effect size was small. One must look at the effect size and uncertainty measures (e.g., standard error and confidence interval) to evaluate whether the results are clinically or scientifically relevant.
Now, how do we avoid misusing P-values? One journal even prohibits P-values, saying that the null hypothesis significance testing procedure is invalid . I think this is a rather extreme action. Still, it is collateral evidence of the rampant misuse of the P-value. As many statisticians have said, the P-value itself is not the problem. One should clearly understand what a P-value really means and should not judge the results of a study or experiment by relying only on the P-value. There are other approaches that can supplement P-values, such as confidence, credibility, or prediction intervals; Bayesian methods; and alternative measures of evidence such as likelihood ratios or Bayes factors . Chavalarias et al.  also recommended that rather than reporting isolated P-values, articles should include effect sizes and uncertainty metrics. I strongly encourage the readers of this article to read the papers listed in the references (and other relevant papers as well), which will lead them to a deeper understanding of P-values and other important statistical concepts.
No potential conflict of interest relevant to this article was reported.