After a meeting held in November 2001 to which authors of all published meta-epidemiological studies were invited, five datasets were made available for us to use in a combined study. We excluded data from two of the studies: the data of Moher et al were not available in a suitable format,5
whereas McAuley et al did not assess the methodological quality of their included trials.10
We analysed the data from the three remaining studies, which included information on allocation concealment, blinding, and outcome events.4 6 7
Schulz et al used 33 meta-analyses from the Pregnancy and Childbirth Group of the Cochrane Collaboration.4
Each meta-analysis included at least five trials with a combined total of at least 25 outcome events. Kjaergard et al used 14 meta-analyses from 11 systematic reviews, all of which included at least one trial of at least 1000 participants.6
Egger et al analysed 122 meta-analyses from the Cochrane Database of Systematic Reviews
that contained at least five randomised trials.7
Table 1 summarises the methods of each of these studies, including the way that meta-analyses were selected from within systematic reviews and the trial characteristics examined.
Table 1 Overview of contributing meta-epidemiological studies
We searched Medline, followed by Embase, in order to assign unique identifiers provided by the literature databases to each meta-analysis and each trial. References not indexed in either database were manually assigned a unique identifier. Using the identifier variable, we identified meta-analyses containing at least one overlapping trial. We then removed duplicate meta-analyses until there was no overlap between the remaining meta-analyses except for a small number of trials that contributed to more than one meta-analysis because they had more than one intervention arm or presented more than one outcome.
Assessment of trial quality
In the study by Schulz et al one researcher, who was blinded to the trial outcome, assessed the methodological quality of the included trials using a detailed classification scheme.4
In the study by Kjaergard et al assessments were done by two observers who were not blinded to study results.6
Inter-rater reliability of quality assessments was assessed in 30 randomly selected randomised controlled trials and found to be high (intraclass correlation coefficient 0.96). The study of Egger et al was based on quality assessments by authors of the Cochrane reviews, which were generally done in duplicate by two observers.7
Table 2 shows the definitions used in the three studies for concealment of allocation and blinding. Definitions of adequate allocation concealment and blinding were similar in all three studies. Allocation concealment was assessed as adequate, unclear, or inadequate in two studies4 7
and as adequate or inadequate in the other.6
Blinding was assessed as present (trials described as double blind and using adequate methods such as identical placebo tablets or including blinding of the person assessing outcome) or absent (blinding not performed or not reported, or distinguishable interventions such as tablets and injections were compared). We assessed inter-study reliability of quality assessment using trials included in more than one study and found it was good (median κ statistic 0.67).
Table 2 Summary of definitions used in meta-epidemiological studies for assessments of allocation concealment of study participants and of blinding
Interventions and outcomes
For each meta-analysis in the final dataset, we classified the type of intervention and the type of outcome. Classifications were finalised before we examined associations with trial characteristics. We coded interventions using the classification of Moja et al11
(drugs; rehabilitation or psychosocial; prevention or screening; surgery or radiotherapy; communication, organisational, or educational; alternative therapeutic; other) and subsequently dichotomised them as drug or non-drug interventions.
We classified outcomes in two ways: firstly, as objectively or subjectively assessed, and, secondly, as all cause mortality or other outcomes. The definition of objective and subjective outcomes was based on the extent to which outcome assessment could be influenced by investigators’ judgment. Objectively assessed outcomes included all cause mortality, measures based on a recognised laboratory procedure (such as measurement of haemoglobin concentrations), other objective measures (such as preterm birth), and surgical or instrumental outcomes (all of these were concerned with childbirth, such as caesarean section or instrumental delivery). Note that such surgical outcomes (classified as objectively assessed) depend on doctors’ decisions, which could, in the absence of blinding, be affected by knowledge of the intervention received.
Subjectively assessed outcome measures included patient reported outcomes, physician assessed disease outcomes (such as vascular events, pyelonephritis, or respiratory distress syndrome), measures combined from several outcomes, and withdrawals or study dropouts. When different methods of outcome assessment were used in different trials in the same meta-analysis we classified the review according to the most subjective method. For example, reviews of smoking cessation used “the most rigorous assessment reported by each included trial.” For some trials this was an objective measurement of exhaled carbon monoxide, for some it was repeated questionnaires, and for some it was a single interview. We therefore classified the outcome in this meta-analysis as “patient reported,” based on the trials using interviews to assess smoking behaviour.
We measured intervention effects as odds ratios. Outcome events were recoded where necessary so that an odds ratio below 1 indicated a beneficial effect of the experimental intervention. We calculated the combined effect estimates separately in trials with and without the characteristic of interest (inadequate or unclear allocation concealment or lack of blinding). We used logistic regression models described previously12
to estimate ratios of odds ratios comparing intervention effects in trials with and without the characteristic of interest. For example, a ratio of odds ratios of 0.7 for trials without blinding would imply that the estimates of intervention effects were exaggerated by 30% in trials without blinding compared with trials with blinding. We derived 95% confidence intervals using robust standard errors allowing for heterogeneity between meta-analyses.12
We also calculated ratios of odds ratios separately in each meta-analysis and combined these using random effects meta-analyses, in order to estimate variability between meta-analyses in the effect of trial characteristics.12
Ratios of odds ratios estimated in this way were consistent with those estimated by logistic regression and are not reported here. Note that meta-analyses in which all trials had the same value of a characteristic (for example, allocation was inadequately concealed in all trials) did not contribute to the estimated effect of that characteristic.
We included interaction terms in logistic regression models to assess whether effects of trial quality varied with the type of intervention or type of outcome. To assess whether there was confounding between the effects of allocation concealment and blinding, we estimated the effect of each characteristic on intervention effects in the same logistic regression model. All analyses were done in Stata SE version 9.0 (Stata Corporation, College Station, Texas).