We used issue 2, 2003, of the Cochrane Database of Systematic Reviews (n
= 1669 reviews). We imported into Stata software all meta-analyses that had binary outcomes and numerical 2 × 2 table information available (n
= 12 709).18
We did not consider studies where no patients in either arm of the study had an event, or all patients in both arms had an event; this eliminated 906 meta-analyses. Zero counts in one arm only were handled in the calculations via the addition of 0.5 to all data cells, which allowed an odds ratio to be calculated without distorting the data appreciably. Meta-analysis data sets were further scrutinized for similarity. When numbers of studies, patients and events were all the same and summary results were identical (to 7 digits of accuracy), the meta-analyses were considered to contain duplicate data sets and only one of them was retained: similarity checks eliminated 761 duplicate meta-analyses. We also excluded meta-analyses where only 2 studies were available (n
= 4169), which makes correlation and regression diagnostics impossible to calculate. Thus, our analysis of the wider Cochrane data set included data from 6873 meta-analyses.
The data sets of these meta-analyses are not necessarily independent. Within the same systematic review, different outcomes, contrasts and analyses may be correlated. To minimize correlation, we created a separate, more restricted data set for which we selected one meta-analysis, the one with the largest number of studies, per systematic review. When the largest number of studies was equal in 2 or more of the meta-analyses, we chose the one with the largest number of subjects; if that number was also equal, we chose the one with the largest number of events. The problem of inappropriateness of the asymmetry tests due to limited number of studies was thereby minimized in this analysis of the restricted Cochrane data set of data from 846 meta-analyses.
For each eligible meta-analysis, we evaluated 4 aspects that bear on whether applying an asymmetry test may be meaningful or appropriate. Statistical significance was tested with the χ2
-based Q statistic and considered significant for p
< 0.10 (2-tailed);19
the extent of between-study heterogeneity was measured with the I2
statistic and considered large for values of 50% or more.20
The number of included studies was noted; 10 or more was considered sufficient. To see if the difference in precision of the largest and the smallest study was sufficiently large (ratio of extreme values of variances > 4), we noted the ratio of the maximal versus minimal variance (the square of the standard error of estimates) across the included studies. Finally, we recorded whether at least one study had found formally statistically significant results (p
Some debate about the extent to which criteria need be fulfilled for asymmetry tests to be meaningful or appropriate is unavoidable. The thresholds listed above are not very demanding, based on the properties of the tests. Results of analyses with alternative, even more lenient criteria are illustrated in Venn diagrams of the 4 overlapping criteria.
The odds ratio was used as the metric of choice for all the meta-analyses. We documented the degree of overlap of the criteria described above and the number of meta-analyses that would qualify, based not only upon each criterion but also on combinations thereof.
We evaluated each meta-analysis by means of 3 asymmetry tests: the 2 most popular tests in the literature (the Begg– Mazumdar τ rank-correlation coefficient,7
and the standard regression test of the standardized effect size [i.e., the natural logarithm of the odds ratio divided by its standard error] against its precision [the inverse of the standard error]8
) and a new variant, a modified version of the regression test, which has a lower false-positive rate.10
For all tests, statistical significance was claimed for p
< 0.10 (2-tailed).7,8,10
We estimated inferences on the basis of these 3 tests in the entire data sets and in the subsets of meta-analyses fulfilling the appropriateness criteria already described. Pairwise concordance between the 3 tests was assessed with the κ statistic.21
The Cochrane Handbook for Systematic Reviews of Interventions16
has taken a critical stance to the use of these tests. RevMan, the Cochrane Library meta-analysis software, does not include any options for running them, and their use in the Cochrane Library is limited.22
We therefore used a sample of meta-analyses in printed journals to examine whether these tests are used inappropriately in practice. We examined papers published in 2005 that cited the most common reference for the standard regression test,8
the asymmetry test most commonly used in the current literature. We screened citations in sequential order (as indexed in the Science Citation Index) until we identified 60 meta-analyses in which asymmetry testing had been employed. The 60 meta-analyses examined were within 24 published articles. Although we focused on the standard regression test,8
we also recorded results from the other 2 tests whenever such data were reported. We examined whether these 60 meta-analyses fulfilled the criteria that we set, what they found, and how they interpreted the application of the test.