|Home | About | Journals | Submit | Contact Us | Français|
Statistical tests for funnel-plot asymmetry are common in meta-analyses. Inappropriate application can generate misleading inferences about publication bias. We aimed to measure, in a survey of meta-analyses, how frequently the application of these tests would be not meaningful or inappropriate.
We evaluated all meta-analyses of binary outcomes with é 3 studies in the Cochrane Database of Systematic Reviews (2003, issue 2). A separate, restricted analysis was confined to the largest meta-analysis in each of the review articles. In each meta-analysis, we assessed whether criteria to apply asymmetry tests were met: no significant heterogeneity, I2 < 50%, é 10 studies (with statistically significant results in at least 1) and ratio of the maximal to minimal variance across studies > 4. We performed a correlation and 2 regression asymmetry tests and evaluated their concordance. Finally, we sampled 60 meta-analyses from print journals in 2005 that cited use of the standard regression test.
A total of 366 of 6873 (5%) and 98 of 846 meta-analyses (12%) in the wider and restricted Cochrane data set, respectively, would have qualified for use of asymmetry tests. Asymmetry test results were significant in 7%–18% of the meta-analyses. Concordance between the 3 tests was modest (estimated k 0.33–0.66). Of the 60 journal meta-analyses, 7 (12%) would qualify for asymmetry tests; all 11 claims for identification of publication bias were made in the face of large and significant heterogeneity.
Statistical conditions for employing asymmetry tests for publication bias are absent from most meta-analyses; yet, in medical journals these tests are performed often and interpreted erroneously.
Publication bias, the selective publication of studies based on whether results are “positive” or not, is a major threat to the validity of clinical research.1–4 This bias can distort the totality of the available evidence on a research question, which leads to misleading inferences in reviews and meta-analyses. Without up-front study registration, however, this bias is difficult to identify after the fact.5 Many tests have therefore been proposed to help identify publication bias.6
The most common approaches try to investigate the presence of asymmetry in (inverted) funnel plots.7–10 A funnel plot shows the relation between study effect size and its precision. The premise is that small studies are more likely to remain unpublished if their results are nonsignificant or unfavourable, whereas larger studies get published regardless. This leads to funnel-plot asymmetry. Although visual inspection of funnel plots is unreliable,11,12 statistical tests can be used to quantify the asymmetry.7–10 These tests have become popular: one relevant article8 has been cited more than 1000 times.
The limitations of these tests have been documented for some time. Begg and Mazumdar7 mentioned in 1994 that the false-positive rates of their popular rank-correlation test were too low. In 2000, Sterne and colleagues13 showed in a simulation study that the regression method described by Egger and associates8 was more powerful than the rank correlation test, although the power of either method was low for meta-analyses of 10 or fewer trials. False-positive results were found to be a major concern in the presence of heterogeneity.13,14 To reduce the problem, a modified regression test was developed,10 and several other tests proposed.6,15 Because they differ in their assumptions and statistical properties, discordant results can be expected with different tests.
There are situations when the use of these tests is clearly inappropriate, and others where their use is futile or meaningless. Application of these tests with few studies is not wrong, but has low statistical power. Application in the presence of heterogeneity is more clearly inappropriate, and may lead to false-positive claims for publication bias.14,16,17 When all available studies are equally large (i.e., have similar precision), the tests are not meaningful. Finally, it makes no sense to evaluate whether studies with significant results are preferentially published when none with significant results have been published.
Despite these limitations, these tests figure prominently in the medical literature. It would be useful to estimate how often these tests are appropriately or meaningfully applied. We therefore appraised almost 7000 meta-analyses in the Cochrane Database of Systematic Reviews to discover the extent to which tests of funnel-plot asymmetry would be inappropriate or nonconcordant. We also examined the appropriateness of the application of asymmetry testing in meta-analyses recently published in print journals.
We used issue 2, 2003, of the Cochrane Database of Systematic Reviews (n = 1669 reviews). We imported into Stata software all meta-analyses that had binary outcomes and numerical 2 × 2 table information available (n = 12 709).18 We did not consider studies where no patients in either arm of the study had an event, or all patients in both arms had an event; this eliminated 906 meta-analyses. Zero counts in one arm only were handled in the calculations via the addition of 0.5 to all data cells, which allowed an odds ratio to be calculated without distorting the data appreciably. Meta-analysis data sets were further scrutinized for similarity. When numbers of studies, patients and events were all the same and summary results were identical (to 7 digits of accuracy), the meta-analyses were considered to contain duplicate data sets and only one of them was retained: similarity checks eliminated 761 duplicate meta-analyses. We also excluded meta-analyses where only 2 studies were available (n = 4169), which makes correlation and regression diagnostics impossible to calculate. Thus, our analysis of the wider Cochrane data set included data from 6873 meta-analyses.
The data sets of these meta-analyses are not necessarily independent. Within the same systematic review, different outcomes, contrasts and analyses may be correlated. To minimize correlation, we created a separate, more restricted data set for which we selected one meta-analysis, the one with the largest number of studies, per systematic review. When the largest number of studies was equal in 2 or more of the meta-analyses, we chose the one with the largest number of subjects; if that number was also equal, we chose the one with the largest number of events. The problem of inappropriateness of the asymmetry tests due to limited number of studies was thereby minimized in this analysis of the restricted Cochrane data set of data from 846 meta-analyses.
For each eligible meta-analysis, we evaluated 4 aspects that bear on whether applying an asymmetry test may be meaningful or appropriate. Statistical significance was tested with the χ2-based Q statistic and considered significant for p < 0.10 (2-tailed);19 the extent of between-study heterogeneity was measured with the I2 statistic and considered large for values of 50% or more.20 The number of included studies was noted; 10 or more was considered sufficient. To see if the difference in precision of the largest and the smallest study was sufficiently large (ratio of extreme values of variances > 4), we noted the ratio of the maximal versus minimal variance (the square of the standard error of estimates) across the included studies. Finally, we recorded whether at least one study had found formally statistically significant results (p < 0.05).
Some debate about the extent to which criteria need be fulfilled for asymmetry tests to be meaningful or appropriate is unavoidable. The thresholds listed above are not very demanding, based on the properties of the tests. Results of analyses with alternative, even more lenient criteria are illustrated in Venn diagrams of the 4 overlapping criteria.
The odds ratio was used as the metric of choice for all the meta-analyses. We documented the degree of overlap of the criteria described above and the number of meta-analyses that would qualify, based not only upon each criterion but also on combinations thereof.
We evaluated each meta-analysis by means of 3 asymmetry tests: the 2 most popular tests in the literature (the Begg– Mazumdar τ rank-correlation coefficient,7 and the standard regression test of the standardized effect size [i.e., the natural logarithm of the odds ratio divided by its standard error] against its precision [the inverse of the standard error]8) and a new variant, a modified version of the regression test, which has a lower false-positive rate.10 For all tests, statistical significance was claimed for p < 0.10 (2-tailed).7,8,10 We estimated inferences on the basis of these 3 tests in the entire data sets and in the subsets of meta-analyses fulfilling the appropriateness criteria already described. Pairwise concordance between the 3 tests was assessed with the κ statistic.21
The Cochrane Handbook for Systematic Reviews of Interventions16 has taken a critical stance to the use of these tests. RevMan, the Cochrane Library meta-analysis software, does not include any options for running them, and their use in the Cochrane Library is limited.22 We therefore used a sample of meta-analyses in printed journals to examine whether these tests are used inappropriately in practice. We examined papers published in 2005 that cited the most common reference for the standard regression test,8 the asymmetry test most commonly used in the current literature. We screened citations in sequential order (as indexed in the Science Citation Index) until we identified 60 meta-analyses in which asymmetry testing had been employed. The 60 meta-analyses examined were within 24 published articles. Although we focused on the standard regression test,8 we also recorded results from the other 2 tests whenever such data were reported. We examined whether these 60 meta-analyses fulfilled the criteria that we set, what they found, and how they interpreted the application of the test.
In terms of fulfillment of criteria, the most common feasibility problem we encountered in both of our Cochrane data-set analyses was too low a number of studies, with three-quarters or more of the meta-analyses examining fewer than 10 studies (Table 1). Lack of significant studies was also a major issue: of the wider and restricted data sets, about half and a third of the meta-analyses, respectively, included no studies with statistically significant results; a fifth/a quarter had significant or large between-study heterogeneity; and nearly a quarter/ a fifth had a ratio of extreme values of variances of 4 or greater. Only 366 (5%) of the meta-analyses in the wider Cochrane data set and 98 (12%) of those in the restricted Cochrane data set fulfilled all 4 of the original criteria (Fig. 1, left).
Results of the 3 tests showed statistically significant asymmetry in few meta-analyses (Table 2); overall, in the 2 data sets, rates of significant signals (i.e., statistically significant results) varied between 7% and 18%. They tended to be smallest for the correlation test and highest for the unmodified standard regression test, but did not much differ between the 2 data sets. When the data sets were split according to whether meta-analyses met the criteria for applying asymmetry tests or not, significant signals were more prevalent in the meta-analyses that fulfilled the criteria than in those that did not. Nevertheless, even in the former group, the rates of signals varied from 14% to 24%.
The 3 asymmetry tests had modest concordance across the entire data sets (Table 2, Fig. 2); results were largely similar across the wider and restricted Cochrane data sets. Overall, 3% and 4% of the meta-analyses, respectively, gave a significant signal with all 3 tests. In 19% and 22% of the meta-analyses, a result from at least 1 of the 3 tests was significant. Estimated κ values fell generally below 0.5 (range 0.33–0.45) for the concordance of the correlation test with either of the regression diagnostics, and were somewhat higher (0.64–0.66) for concordance between the unmodified and modified regression diagnostics. When analyses were limited to meta-analyses that fulfilled the criteria for asymmetry tests, concordance slightly improved between the correlation and the regression diagnostics (estimated κ 0.39–0.60) and worsened slightly between the unmodified and modified regression diagnostics (estimated κ 0.57–0.59).
Of the 60 meta-analyses that stated their use of the regression test within the 24 print articles, use of the test was meaningful or appropriate in 7 of the meta-analyses (12%, 95% confidence interval 5%–23%). Of the 24 articles, 6 had at least one meta-analysis where use of the test was appropriate. Twenty-six meta-analyses had significant heterogeneity (all with I2 > 50%), and another 4 had I2 > 50% without statistically significant heterogeneity. Twenty-six meta-analyses were of fewer than 10 studies. Eighteen meta-analyses included no significant studies; 3 had ratios of extreme variances ≤ 4. Four of the 24 articles also reported rank correlation test results (with similar inferences). Another cited the regression test when what had actually been performed were rank correlation tests. One other article apparently used a regression test based on sample size, a different test than the one that was cited.
All 24 articles claimed that the tests were done to estimate publication bias, with a single exception: an article that clarified that the authors tested for “small-study bias, of which publication bias is one potential cause.” Eleven meta-analyses (18%) claimed that there was evidence for publication bias, whereas the other 49 stated that they found no such evidence. All meta-analyses that claimed to have detected publication bias were found to have between-study heterogeneity that was large and statistically significant.
In most meta-analyses, the application of funnel-plot asymmetry tests to detect publication bias is inappropriate or not meaningful. We found a major problem to be lack of a sufficient number of studies; lack of studies with significant results and the presence of heterogeneity were also common issues. In a smaller proportion of meta-analyses, differences in the magnitude of the smallest versus the largest studies were negligible.
When each of 3 asymmetry (“publication bias”) tests were applied, we found a minority of the examined meta-analyses to have a positive signal. About a fifth of the meta-analyses gave a signal with any of the 3 tests; 3%–4% gave consistent signals for asymmetry with all diagnostics. In the absence of a criterion standard about the presence of publication bias, it is impossible to decide whether these figures were low because the tests we examined were underpowered or because publication bias is uncommon. Moreover, concordance among the 3 tests was modest. Automatic and undocumented use of these tests may lead to unreliable inferences.
A survey of 60 recently published meta-analyses from 24 published reports that had cited use of the standard regression test8 revealed that most had used the test inappropriately. With one exception, all these articles misleadingly equated the results of these tests with the presence or absence of publication bias, ignoring numerous other causes that may underlie differences between small and larger studies.8 Moreover, all signals for publication bias occurred in meta-analyses with large, significant between-study heterogeneity. It is also disquieting that 82% of the meta-analyses were assumed to have no publication bias simply because of a “negative” asymmetry test result.
When these diagnostics give significant signals, this does not necessarily mean that publication bias is present. This applies even when the meta-analyses fulfill all of the 4 eligibility criteria that we considered. In the absence of a prospective registry of studies, publication bias cannot be proven or excluded, because a criterion standard is lacking.
The 4 criteria we used are merely technical and conceptual prerequisites. Even if statistical prerequisites are met, the conceptual assumptions may sometimes not hold. Very large sample size,11 increased attention to the research question and heightened interest in contradicting previous publications with extreme opposite results may contribute as much or more than statistical significance to dictating publication in selected cases or in entire scientific fields.23
We used the Cochrane Database of Systematic Reviews because it is by far the largest compilation of meta-analyses. The composition of this database may differ from that of the totality of meta-analyses published.22,24,25 Despite some uneven emphasis on specific diseases in the evolving Cochrane Database of Systematic Reviews,26 this database is likely to be less selective compared with the meta-analyses that appear in the medical journal literature. Meta-analyses published in printed medical journals are larger but also more likely to have large heterogeneity, because they also include a greater share of nonrandomized studies. In the journal literature, the percentage of meta-analyses where asymmetry tests are applied inappropriately is therefore also very high.
There can be some subjectivity about thresholds for a definition of when a statistical test is meaningful or appropriate. Our criteria tended toward the lenient; use of even more lenient criteria would increase the proportion of appropriateness, but not to very high percentages (Fig. 1).
Publication bias is compounded by additional biases that pertain to selective outcome reporting27,28 and “significance-chasing”29 in the data published. It would be misleading to claim that all these problems can be addressed with asymmetry tests. Occasionally, in a meta-analysis of many studies, the retrieval of unpublished data may “correct” a funnel-plot asymmetry.30 However, we should caution that, when unpublished data exist, only a portion might possibly be retrievable; so, it is unknown what would happen if data from all studies could be retrieved. Whenever both unpublished and published information is available, the results of these 2 types of evidence should be compared. Nevertheless, as has been stressed repeatedly, prospective registration of clinical studies and of their analyses and outcomes5,31 may be the only means to properly address publication bias.
In conclusion, meta-analysts should refrain from inappropriate or unmeaningful application of funnel-plot asymmetry tests. Readers should not be misled that publication bias has been documented or excluded according to inappropriate use or interpretation of funnel plots.
This article has been peer reviewed.
Contributors: John Ioannidis originated the study concept and wrote the protocol and manuscript, with input and critical revisions by Thomas Trikalinos. John Ioannidis evaluated the meta-analyses published in printed journals; Thomas Trikalinos performed all the statistical analyses. Both authors interpreted the data from their analyses, and approved the final version of the article for publication.
Competing interests: None declared.
Correspondence to: Dr. John Ioannidis, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, 45 110 Ioannina, Greece; fax +30 26510 97867; jioannid/at/cc.uoi.gr