Simpson's paradox, also known as the ecological effect, was first described by Yule in 1903 [1
] and is named after Simpson's article in 1951 [2
]. It refers to the phenomenon that sometimes an association between two dichotomous variables is similar within subgroups of a population, say females and males, but changes its sign if the individuals of the subgroups are pooled without stratification. This is reflected in the title of a paper by Baker and Kramer ('Good for women, good for men, bad for people', [3
]). There are numerous examples, particularly from the areas of epidemiology and social sciences, of associations strongly affected by observed or unobserved dichotomous variables [4
]. Even a tale based on Simpson's paradox has been told [9
]. The reason for its occurrence is the existence of an influencing variable that is not accounted for, often unobserved. Thus, it may seem that the effect is charactistic for observational studies and can be avoided by randomization.
This is not true, as was pointed out by others [10
]. As Altman and Deeks note, Simpson's paradox is not really a paradoxon, but a form of bias, resulting from heterogeneity in the data if not accounted for [10
]. Often tables of hypothetical as well as real data examples are presented. However, though these examples are easily recalculated, there is a need for readers, especially clinicians and practitioners in other fields, to really understand the nature of the phenomenon.
Baker and Kramer proposed a plot, later called the Baker-Kramer (BK) plot, which was independently invented by others much earlier, for graphically illustrating Simpson's paradoxon [3
]. Their examples stem from hypothetical data. For this plot it is required that the influencing variable is dichotomous. In the setting of a meta-analysis, however, the main source of heterogeneity and thus the most important influential variable is well-known and not dichotomous in general: it is the variable 'trial'. A perfect example of Simpson's paradox occurring in a meta-analysis of case-control studies is given by Hanley and Theriault [8
]. In this meta-analysis all single trials show an increased risk for exposed individuals, while the pooled analysis reverses this effect.
As a (less perfect) example for meta-analysis of RCTs, we use a recent systematic review of the effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular diseases [16
]. It stated a significant increase of myocardial infarctions in the rosiglitazone group. The authors found a Peto odds ratio 1.428 with 95 per cent confidence interval [1.031; 1.979] and p-value 0.0321 (fixed effect model) [17
]. This meta-analysis immediately raised a discussion not only about the safety of the drug, but also on methodological issues referring to potential heterogeneity, different follow-up times, the large number of trials with no or very few events and the imbalanced group sizes within many trials [18
]. A re-analysis of the data using several variants of the Mantel-Haenszel method found that the significance of the effect is questionable (odds ratio estimates between 1.26 and 1.36, most of them not significant) [18
]. Though not consistently significant, meta-analysis (all methods) exhibits an excess of events in the treatment group (rosiglitazone), compared to the control group (any other regimen). For example, taking the risk difference (fixed effect model, Mantel-Haenszel method) results in a combined estimate of 0.002 (95 per cent confidence interval [0.000; 0.004] with p-value 0.0549), corresponding to an estimated NNH (Number Needed to Harm) of about 489 patients.
One problem of this data is the large number of trials without any events. If the outcome is measured by the risk ratio or the odds ratio, these trials are often excluded from a meta-analysis because it is argued that they do not contribute any information about the magnitude of the treatment effect [21
]. In order to use all available information, simple pooling of all single tables could be rather tempting. It is seemingly convenient here because of the considerable number of double-zero studies, despite of the general consensus that this is discouraged [22
]. If pooling is done – in spite of this objection – for the main endpoint myocardial infarction (MI), we in fact surprisingly observe that the pooled 2 × 2-table provides the contrary: the risk of MI for the treated individuals is 0.0055 and therefore less than for the control group (0.0059), see Table . The pooled odds ratio is 0.94 with 95 per cent confidence interval [0.69; 1.29] (p-value 0.7109). This (non-significant) effect reversion, produced by pooling, was observed by another author who in the light of these found the results of the meta-analysis 'intriguing' [23
]. It can be seen as a milder form of Simpson's paradox.
Pooled data of rosiglitazone meta-analysis (full data see ref. )
In the next section, we first develop two kinds of plots to reveal and illustrate the mechanism of Simpson's paradox and effect reversion, using the rosiglitazone example. The third plot emerges from overlaying both plots. In the results section, we apply the plots to the data given by Hanley and Theriault [8
] and discuss both methods and results. The paper is ended with conclusions.