|Home | About | Journals | Submit | Contact Us | Français|
A report by Seliger et al. on statin use and risk of glioma prompted Greenland to write a letter-to-the-editor in which he again explains why lack of statistical significance must not be interpreted as lack of association [1, 2]. Greenland and colleagues have also stressed recently that a statistically significant association very well may be due to chance . These two statements hold just the same regardless of whether the statistical significance judgment is based on if the P value is smaller than 5% or if the confidence interval excludes the no-effect value.
The consequences of dividing results into the two separate categories statistically significant and non-significant have been discussed extensively. The topic is part of the curriculum in courses and it appears in textbooks. Journals have had editorial comments and groups of experts with various tasks have provided guidance. Individual scientists have discussed this in commentaries like the current and in more comprehensive formats and there are other letters to the editor than Greenland’s that point to problematic use of the significance concept. For an extensive list of references, see a recent article in EJE .
Yet, the reporting style that points out whether associations are significant or not remains common. Although there is at most a thin marginal difference between a lower confidence bound of .99 and one of 1.01 we have all noticed disappointed faces when results start to appear and it becomes clear that figures don’t quite reach statistical significance and we have noted correspondingly happy faces in case of the opposite. A mechanism by which chance could be put out of the equation and the researcher freed to focus on systematic errors and biologic plausibility for assessment of causality would have been a great gift to the research community. But significance testing of null-hypotheses was not designed to serve this purpose. It was developed as a decision-making tool, and decisions are rarely made from the outcome of one single study.
A meta-analysis that offers new insights into ways of reporting study results is published in the current issue of European Journal of Epidemiology . It is an attempt to estimate trends in usage of P values, significance tests, and confidence intervals in close to 90,000 articles published in five general medical journals and in seven epidemiology journals. The basis is the computerized abstracts in PubMed, which allows for the large study size but limits the information to the wordings in the abstract. The key findings are that confidence intervals presented in their own right and not as proxies for statistical tests are becoming more common, particularly in epidemiology journals. Although significance testing is becoming less popular in most epidemiology journals and some widely read medical journals, it is still very common in some prominent medical journals. While these results signal an improvement over time and a rather positive trend, particularly among epidemiology journals, it is worth noting that still only about 40% of the articles in epidemiology journals rely solely on confidence intervals for assessing precision in the reported estimates, based on what shows up in the abstracts. In the selected medical journals this figure was about 20%.
A few things are immediately clear. First, editorial policy plays a role as evidenced by the position that Epidemiology takes. Confidence intervals have always been the predominant mode of reporting in this journal as the result of an editorial policy that was in place from the start of the journal. Second, there is a clear difference between epidemiology journals and medical journals with considerably more reliance on confidence intervals in the epidemiology journals; the data don’t allow a comparison restricted to reports of epidemiological studies. Third, the prevalence of statistical significance testing varies across medical journals and in particular prevails over time in the high impact journals JAMA, NEJM, and Lancet.
Thus, this meta-analysis informs about the prevalence and trends in usage of significance testing, but we still don’t know why this so often is the reporting style of choice. We also don’t know the reason for the differences in this respect between types of journal and between individual journals. A list of candidates for explanation is provided below. These explanations are based on a blend of different factors including compliance with perceived or real expectations from the surrounding research community, on convenience on the side of the researcher when reporting results, and on ignorance.
This is the list: