|Home | About | Journals | Submit | Contact Us | Français|
We read the paper by Huak1 with great interest. The author is to be congratulated for his view regarding the biostatistics. Although his paper is interesting, some considerations should be addressed.
Huak1 suggested that when statistical significance was reached, the manuscript stood a better chance of getting published (scenario 1 and 3 in Table 3). In contrast, a study with negative (statistically insignificant) results would have a lower possibility of getting published if its probability value (P-value) was of >.05 (scenario 2). A study that lacks both clinical and statistical significance would not merit inclusion in the literature (scenario 4).
Indeed, this doctrine seems to be wrong. It contributes to the so-called ‘positive outcome bias’ or ‘pipeline bias’, a common form of ‘publication bias (PB)’. PB influences the chances of publication and the tendency of investigators, reviewers, and editors to submit or accept manuscripts for publication based on the direction or strength of the findings of quantitative studies. It can occur in any step before a research paper is published. The positive outcome bias decreases the tendency of a manuscript being published when its results are near the null, not statistically significant, or otherwise less interesting.2,3
Several studies including Cochrane reviews demonstrated that studies with positive findings were given priority in publication compared with those with inconclusive or invalidating results or with findings contrary to the study hypotheses.2 This preference can mislead readers about the effectiveness of the reported therapy2 and inflate the rate of type I (false-positive) error of a meta-analysis.4
The Declaration of Helsinki (Article 30) clearly states that ‘Authors, editors and publishers all have ethical obligations with regard to the publication of the results of research. Authors have a duty to make the results of their research on human subjects publicly available and are accountable for the completeness and accuracy of their reports. They should adhere to accepted guidelines for ethical reporting. Negative and inconclusive as well as positive results should be published or otherwise made publicly available’.5 Hence, it is the moral responsibility of researchers, fund givers and journals to distribute research findings, regardless of the outcome. Authors and fund providers should not have a preference to submit only studies reporting positive results. Meanwhile, journals should implement the ‘must have’ measures to diminish PB from their selection processes. In this way, scientific integrity will be upheld and maintained.2 Bias in the dissemination of research, publication, interpretation and review of scientific findings is considered as ‘scientific misconduct’.6 For details on publication bias, we refer to our recent publication.2
Moreover, the P-value does not provide a good measure of the strength of evidence against the null hypothesis of no difference (no association between a characteristic and an outcome), even though it is often interpreted in this way.7 A small P-value signifies that the evidence in favour of the null hypothesis is weak and that the likelihood of the observed differences due to chance is so small that the null hypothesis is unlikely to be true.3,8–10 The rejection of the null hypothesis (when a P-value of <.05) must be based on the limitations/assumptions that (1) there is up to 5% chance of a type I error of finding a difference where there is none, (2) there is 50% chance of a type II error of finding no difference where there is one, (3) the data are normally distributed, (4) they follow exactly the same distribution as that of the population from which the sample was taken.10
Conversely, a P-value of >.05 only indicates that the evidence is inadequate to reject the null hypothesis, and the alternative hypothesis (the opposite of the null hypothesis) that the observed differences between the groups is real or not due to chance is not accepted. As a consequence, the study results are unlikely to have occurred by chance.3,7–11 However, it does not imply that the null hypothesis is true, and that the test treatment and control (e.g. standard treatment, placebo, or baseline) in the study are equivalent.3 The study itself may have the weakness such as a small sample size to detect a clinically important difference as statistically significant.9 For example, a P-value of 0.08, albeit not significant, does not mean ‘nil’. There is still an 8% chance that the null hypothesis is true.7 A P-value alone cannot be used to accept or reject the null hypothesis. The cut-off level of 0.05 is purely arbitrary and gives no indication as to the clinical significance of any observed differences.3,7–9
There is not much difference between a P-value of 0.055 and a P-value of 0.045. Small changes in sample size can tilt the P-value from one side of the cut-off to the other.8,9 Any small difference will be statistically significant (P<.05) if the sample size is large enough, regardless of the clinical relevance. In contrast, any large difference, no matter how clinically important, will not be statistically significant (P>.05). Hence, a low P-value in a small study is more evidential than the same P-value in a large study. However, it will increase the effect of PB.3,7,9
Hypothesis testing using a P-value is a binary decision: yes/no, so it is not reliable. If we test 1000 null hypotheses, while only 10% of which are false, at the levels of α (probability of type I error, which is the probability of rejection of a correct null hypothesis) = 5% and power (1 – β, which is the probability of correctly rejecting the null hypothesis) = 80%, there will be 64% true-positive and 36% false-positive significant results. This means that 36% of the significant P-values will not report the true differences between the 2 treatments.11
Until now, some journals have no longer considered P-values, and many prestigious journals such as the Lancet and the British Medical Journal prefer the effect range estimation (confidence interval: CI) rather than the hypothesis testing.3,7 The P-value is less informative, can be deduced from the CI and conveys no information on clinical importance. A low or high P-value does not prove anything with regard to the effectiveness of an intervention: a P-value of 0.001 does not reflect a larger effect than a P-value of 0.04.8,9 Judgements on the clinical importance of a result should be based on the size of the effect seen rather than the P-value.9 This concept is in agreement with the International Committee of Medical Journal Editors (ICMJE)’s recommendations, the Consolidated Standards of Reporting Trials (CONSORT) statement, and the Quality of Reporting of Meta-analyses (QUORUM) statement.3
CIs are used to infer information of a population based on data obtained from a representative sample from that population. The width of CIs indicates the size and direction of the effect, the amount of random error and the precision of the estimate. The wider the CI, the less is the precision, suggesting more data should be collected before any firm conclusion can be drawn from the results.8 Appropriate interpretation of P-values, CIs and statistical significance was extensively reviewed by many authors.3,7,8,10–12
Taken together, it is a gross misconception that the decision to submit, accept or reject a manuscript for publication relies on the P-value. A study of good quality should be published, regardless of statistical significance.
I totally agree with comments by Pitak-Arnnop et al which emphasizes the importance of clinical relevance (though not statistically significant) and 95% CI confidence interval in today's publication process - this change of mindset had evolved over many years of education on research & biostatistics. I must clarify that the present article is highlighting the well-known bias of significant P-value publication and hopefully to educate the researchers on this aspect - look at the clinical relevance rather than statistical significance. Yes, today, researchers like Pitak-Arnnop et al understood this concept of this poor decision using the P-value and I thank him for adding the important information for the follow-up of this present article (which is constraint by length).