Our results challenge the current consensus about a hierarchy of study designs in clinical research. Contrary to prevailing beliefs, the “average results” from well-designed observational studies (with a cohort or case–control design) did not systematically over-estimate the magnitude of the associations between exposure and outcome as compared with the results of randomized, controlled trials of the same topic. Rather, the summary results of randomized, controlled trials and observational studies were remarkably similar for each clinical topic we examined (). Viewed individually, the observational studies had less variability in point estimates (i.e., less heterogeneity of results) than randomized, controlled trials on the same topic (). In fact, only among randomized, controlled trials did some studies report results in a direction opposite that of the pooled point estimate, representing a paradoxical finding (e.g., treatment of hypertension was unexpectedly associated with higher rates of coronary heart disease in several clinical trials).
Although the data we present are a challenge to accepted beliefs, the findings are consistent with three other types of available evidence. For example, previous investigations have shown that observational cohort studies can produce results similar to those of randomized, controlled trials when similar criteria are used to select study subjects. In addition, data from nonmedical research do not support a hierarchy of research designs. Finally, the finding that there is substantial variation in the results of randomized, controlled trials is consistent with prior evidence of contradictory results among randomized, controlled trials.
First, there is evidence that observational studies can be designed with rigorous methods that mimic those of clinical trials and that well-designed observational studies do not consistently overestimate the effectiveness of therapeutic agents. An analysis19
of 18 randomized and observational studies in health-services research found that treatment effects may differ according to research design, but that “one method does not give a consistently greater effect than the other.” The treatment effects were most similar when the exclusion criteria were similar and when the prognostic factors were accounted for in observational studies.
A specific method used to strengthen observational studies (the “restricted cohort” design9
) adapts principles of the design of randomized, controlled trials to the design of an observational study as follows: it identifies a “zero time” for determining a patient's eligibility and base-line features, uses inclusion and exclusion criteria similar to those of clinical trials, adjusts for differences in base-line susceptibility to the outcome, and uses statistical methods (e.g., intention-to-treat analysis) similar to those of randomized, controlled trials. When these procedures were used in a cohort study9
evaluating the benefit of beta-blockers after recovery from myocardial infarction, the use of a restricted cohort produced results consistent with corresponding findings from the Beta-Blocker Heart Attack Trial20
: the three-year reductions in mortality were 33 percent and 28 percent, respectively.
Second, data in the literature of other scientific disciplines support our contention that research design should not be considered a rigid hierarchy. A comprehensive review of research on various psychological, educational, and behavioral treatments21
identified 302 meta-analyses and examined the reports on the basis of several features, including research design. Results were presented from the 74 meta-analyses that included studies with randomized and observational designs. To allow for comparisons among various topics with different outcome variables, effect size was used as a unit-free measure of the effect of the intervention. The observational designs did not consistently overestimate or underestimate the effect of treatment; the mean value of the difference was a trivial 0.05. Thus, these independent data do not support the contention that observational studies over-estimate effects as compared with randomized, controlled trials.
Third, a review of more than 200 randomized, controlled trials on 36 clinical topics found numerous examples of conflicting results.22
A more recent example is offered by studies addressing whether therapy with monoclonal antibodies improves outcomes among patients with septic shock (reviewed by Horn23
and Angus et al.24
). In addition, one study25
found that the results of meta-analyses based on randomized, controlled trials were often discordant with those of large, simple trials on the same clinical topic. Regardless of the reasons why randomized, controlled trials produce heterogeneous results, the available evidence indicates that a single randomized trial (or only one observational study) cannot be expected to provide a gold-standard result that applies to all clinical situations.
One possible explanation for the finding that observational studies may be less prone to heterogeneity in results than randomized, controlled trials is that each observational study is more likely to include a broad representation of the population at risk. In addition, there is less opportunity for differences in the management of subjects among observational studies. For example, although there is general agreement that physicians do not use therapeutic agents in a uniform way, an observational study would usually include patients with coexisting illnesses and a wide spectrum of disease severity, and treatment would be tailored to the individual patient. In contrast, each randomized, controlled trial may have a distinct group of patients as a result of specific inclusion and exclusion criteria regarding coexisting illnesses and severity of disease, and the experimental protocol for therapy may not be representative of clinical practice.
The relevance of our findings extends beyond their implications for expert panels such as the U.S. Preventive Services Task Force.7
A popular “users' guide” for clinicians26
warns that “the potential for bias is much greater in cohort and case–control studies than in randomized, controlled trials, [so that] recommendations from overviews combining observational studies will be much weaker.” The studies cited to support that claim27,28
(and similar claims29
), however, compare randomized, controlled trials with trials using historical controls, unblinded clinical trials, or clinical trials without randomly assigned control subjects — not with the types of cohort and case–control studies included in our investigation. Thus, data based on “weaker” forms of observational studies are often mistakenly used to criticize all observational research.
We examined the possibility that the quality of individual studies could explain our findings. For example, randomized, controlled trials that did not satisfy criteria with respect to quality could be the source of variability in point estimates, or the observational studies might be of uniformly high quality. When standard assessments of quality were applied to the studies, however, no association was found between the number of criteria for high-quality research that a study satisfied and the rank order of its point estimate (data not shown). Thus, although quality scores have been used in some situations to separate high-quality from low-quality randomized, controlled trials,8
our results are consistent with other studies30
that did not find an association between summary measures of quality and treatment effects. The issue of how to judge the validity of each study (in terms of the methodologic aspects relevant to each investigation) is beyond the scope of this report. However, judging validity is often not as simple as identifying the type of research design or assessing general characteristics of the study.8,26
The meta-analyses of randomized, controlled trials and observational studies that we evaluated included single reports that combined the two types of research design, as well as separate reports for each category (). This mix of reports offers reassurance that our findings are not attributable to the methods used in each meta-analysis. (The overall paucity of meta-analyses including both randomized, controlled trials and observational studies of the same research topic is consistent with our premise that observational studies are not considered trustworthy and that they are therefore not included in such investigations.) The validity of our analysis is also supported by another investigation comparing randomized, controlled trials and observational studies of screening mammography that found results similar to ours.31
Despite the consistency of our results (involving five clinical topics and 99 separate studies), as well as the confirmatory evidence available in the literature, we believe that the appropriate role of observational studies may vary in different situations. For example, observational investigations of some kinds of treatments (e.g., surgical operations and other invasive therapies) may be more prone to selection bias than the observational studies of drugs and noninvasive tests that we examined in this study, and “softer” outcomes (e.g., functional status) may be more readily assessed in randomized, controlled trials. In addition, we are aware of the risk that the results of poorly done observational studies may be used inappropriately — for example,32
to promote ineffective alternative therapies.
Randomized, controlled trials will (and should) remain a prominent tool in clinical research, but the results of a single randomized, controlled trial, or of only one observational study, should be interpreted cautiously. If a randomized, controlled trial is later determined to have given wrong answers, evidence both from other trials and from well-designed cohort or case–control studies can and should be used to find the right answers. The popular belief that only randomized, controlled trials produce trustworthy results and that all observational studies are misleading does a disservice to patient care, clinical investigation, and the education of health care professionals.