For our analysis we targeted specific harms of various medical interventions for which data were already available from large-scale randomized controlled trials. These data were obtained from a previous study in which 1754 systematic reviews in the Cochrane Database of Systematic Reviews (issue 3, 2003) were screened to identify quantitative information on at least 4000 subjects for 66 well-defined, specific harms of various interventions.10
With 4000 subjects equally allocated, there is 80% power (α = 0.05) to detect a difference of 1% in the compared arms for an otherwise uncommon event (risk < 0.25% in the control group). The identified harms had been clearly defined on the basis of clinical and laboratory criteria and with explicit grading of severity and seriousness.10
For each harm, we searched for respective published nonrandomized studies that had at least 4000 subjects. Single randomized trials with such a sample size are difficult to conduct; thus, we accepted the possibility that a meta-analysis of several trials may pass the sample size threshold. Because nonrandomized studies with this sample size are more readily feasible, each eligible observational study had to have at least 4000 subjects.
We considered all nonrandomized controlled studies in which the comparison (intervention v. no treatment, or intervention v. other intervention) was similar to that in the respective randomized trial(s), with “no treatment” corresponding to “placebo.” We excluded noncontrolled studies (the absolute and relative risk conferred by the intervention per se cannot be estimated), unless the specific harm was so rare in the control population that randomized trials had recorded no events in control subjects (in which case the absolute risk among treated subjects would still be meaningful to compare between study designs).
We included nonrandomized studies that had selected participants with the same, overlapping or wider indications for the intervention as those used in the respective randomized trials; we excluded nonrandomized studies in which the indications differed entirely from those in the randomized trial populations.
We accepted risk estimates in nonrandomized studies regardless of whether they were crude or adjusted, were based on numbers of people or person-years, or were directly given or indirectly estimated from the published data.
Two of us (P.P. and G.C.) searched MEDLINE (PubMed) independently (last search October 2004) for qualifying nonrandomized studies that would correspond to each of the 66 harms.10
Whenever there was disagreement on whether a study qualified, consensus was reached with a third investigator (J.I.). For our search strategies, we used the names of the interventions, including the drug-specific names when a class of drugs was involved. There is relatively limited evidence on how best to search for harms-related literature, and thus we started with broad search strategies to maximize sensitivity and avoid missing eligible studies. We also used various terms representing the harms and settings under study, whenever deemed appropriate for improving search specificity without compromising sensitivity. We focused on English-language studies involving humans. The search strategy is described in online Appendix 1 (www.cmaj.ca/cgi/content/full/cmaj.050873/DC1
). We also checked the reference lists of retrieved studies for additional pertinent reports.
For each eligible nonrandomized study, we extracted information on the authors, the year of publication, the study design, the setting, the total sample size, and the estimates of absolute and relative risks conferred by the interventions under study. We captured both crude and adjusted estimates when both were available, and we recorded the covariates used for adjustment.
For the respective evidence from the randomized trials, we retrieved information already recorded10
on the authors and the year of publication of each trial, the setting, the total sample size, and the absolute risk difference and risk ratios with 95% confidence intervals (CIs).
Risk ratios were always available for evidence from the randomized trials. For evidence from the nonrandomized studies, estimates of relative risk included risk ratios, incidence rate ratios, relative hazard ratios from Cox models, and odds ratios from case–control studies (proxies of population-level risk ratios). All results were expressed as point estimates with 95% CIs. When available, adjusted estimates were preferred over crude ones. When several adjusted estimates existed, we selected the one that considered the largest number of covariates. Analyses using the crude estimates yielded qualitatively similar results (data not shown).
For each harms topic, we compared the risk estimates from the randomized trial (or from the meta-analysis of several trials) with the risk estimates from the respective nonrandomized study (or the meta-analysis of several studies). Whenever 2 or more nonrandomized studies were available for a specific topic, we used a random-effects summary estimate,13
using the inverse variance method and estimating the variance from the 95% CIs of each relative risk. Random-effects syntheses were already available for the randomized evidence, as previously described.10
Between-study heterogeneity was estimated using the Q statistic (considered significant at p
We examined whether the estimated increase in relative risk (relative risk – 1) with the harmful intervention (v. no treatment or a less harmful intervention) differed more than 2-fold between the randomized and nonrandomized studies and, if so, in which direction. The 2-fold threshold has been used previously to compare efficacy data from observational and randomized studies.11
We also tested secondarily whether the differences in relative risk between the randomized and nonrandomized studies were beyond chance (p
< 0.05), based on a z statistic estimated from the difference of the natural logarithms of the relative risks and the variance of this difference.14
Finally, we examined whether the differences in absolute risk between the randomized and nonrandomized studies were greater than 2-fold in their point estimates and, secondarily, whether these differences were beyond chance (p < 0.05). We excluded from our analyses topics for which the absolute risk of harm would depend on the available follow-up, since this could vary in different studies.