Despite its shortcomings, this careful, albeit not exhaustive, comparison between randomised controlled trials and observational studies using data from an electronic primary care medical record database reveals several important insights. From an overall perspective, our results suggest that observational studies using databases might produce valid results concerning the efficacy of cardiovascular drug treatments.
Rigour of database studies
Our studies comparing performance of the database and randomised controlled trials were performed in as rigorous fashion as possible.
In addition to using similar inclusion and exclusion criteria and relatively similar time frames, we analysed studies with both a simulated “intention to treat” and an “as treated” design. We analysed data with multiple imputation plus Cox adjusted hazard ratios, and also propensity score plus stratified Cox unadjusted hazard ratios. The propensity score is useful to identify heterogeneity and also incorporates missing data into the analysis in a fashion different from the multiple imputations used with the primary Cox method. We used a subset of the overall cohort without “missing data” on the key confounders (systolic blood pressure, body mass index, and smoking) as a secondary verification analysis to ensure that missing data did not influence the results in the overall cohort. We assessed use of non-study drugs to confirm that cointervention during the study did not account for the results. Computerised random matching and thereby start time delineation for the unexposed group obviated the potential for unanticipated bias related to start time in the unexposed group.
Overall study results
We analysed results of the outcomes for myocardial infarction, stroke, coronary revascularisation, and death for six comparative studies (table 2 and fig 2). We examined the aggregate database study results with conventional biostatistical analyses (Cox adjusted hazard ratios or propensity score analyses, or both) and our newly described prior event rate ratio (PERR) adjustment technique.28 29
When analysed with conventional biostatistical analyses, the database outcome results (independent of death) did not differ significantly from those in the randomised controlled trial in nine of the 17 comparisons. In no instance did the PERR analysis differ significantly from the randomised controlled trial, when there was no difference between the conventional analyses and the trial.
As shown in table 2 and figure 2, when the database outcomes analysed with conventional biostatistical techniques differed significantly from the trial, the PERR analysis results were either not significantly different from or much more similar to the trial results.
The instances where the database results analysed by conventional biostatistical methods differed importantly from the results in the trial presumably reflect unmeasured confounding by indication in the database studies. Thus our findings support concerns that the validity of observational studies must always be viewed with circumspection. The studies reported herein, however, suggest that the PERR technique can identify (by differing from the results with standard statistical methods) and largely correct for the effects of unmeasured confounding, when it exists. The availability in the database of previous event rates, rather than only prevalence data, permitted performance of this analysis.
PERR analytical technique
The underlying hypothesis of the PERR analytical technique is that a comparison between the event rate for a specific outcome in a cohort’s exposed and unexposed patients before entry into the study should reflect the effect of all confounders on that specific outcome independent of the effect of treatment. This assumption holds only when neither the exposed nor unexposed patients have been treated with the study drug before the start of the study. If so, the ratio between the previous events in the exposed and unexposed patients should reflect the aggregate effect of all identified and unidentified confounders.
Therefore, when the unadjusted incidence rate ratio or hazard ratio of that outcome during the study is divided by the ratio for that outcome before the study, this adjustment should correct for the aggregate effects of all identified and unmeasured confounders.
When there are no unmeasured confounders, reflected by similar results of the database Cox adjusted hazard ratio and the randomised controlled trial hazard ratio, the PERR adjusted results should be similar to the Cox adjusted hazard ratio. Based on the empirical findings in these studies, the PERR adjustment seemed to function in this fashion.
When there are unmeasured confounders, presumably resulting from confounding by indication, the results of the PERR adjusted hazard ratio and the Cox adjusted hazard ratio should differ. Our empirical results show that in every instance where the comparison of the Cox adjusted hazard ratio in the database study differed from the results of the trial, suggesting the presence of “unidentified confounding,” the PERR adjustment yielded a result much more consistent with the findings in the trial. Of most importance in all but one instance where unmeasured confounding seemed to be present, the PERR adjusted value identified the presence of unmeasured confounding by differing significantly from the Cox adjusted hazard ratio.
Identification of the PERR method emerged from these studies because the direct comparison of the database observational study and the randomised controlled trial provided a presumed correct answer against which to validate the database results. Further investigation is necessary to fully validate the PERR technique. More extensive statistical simulation studies would determine its limitations and applications and the applicability of the method to additional outcomes. It is also important to appreciate that this technique is outcome specific; it cannot be extrapolated from one outcome to another. Finally, it is restricted to outcomes for which previous events can be ascertained. If an outcome was a study exclusion criterion, it cannot be analysed with this approach, nor can it be applied to death.
The PERR method differs and seems to be more widely applicable than other methods that have been developed in an attempt to address hidden bias.42
As confirmed in our studies, propensity score analysis does not overcome unmeasured confounding. When combined with sensitivity analyses, however, it might provide results that can be interpreted as unlikely to have been influenced by unmeasured covariates.43 44 45
Recently, propensity scores combined with regression calibration were used to address unobserved variables under certain conditions.46 47
Instrumental variable analysis, used commonly in economics, has also been used to address unmeasured confounding. An instrumental variable analysis requires identification of a factor that affects the assignment to treatment but has no direct effect on the outcome.48 49 50
Its applicability and validity for studies of therapeutic efficacy have not been widely examined.42 51 52
Some have suggested that this technique is most suited to address health policy issues rather than specific clinical issues of treatment effectiveness.48
Both the propensity score calibration and the instrumental variable analysis methods have important constraints. The propensity score calibration technique requires the presence of a validation study, whereas the instrumental variable analysis requires identification of an appropriate instrument. These requirements limit their applicability to a wide variety of studies.
Of interest, the DID (difference-in-differences) method used in economic studies, has some similarities to the PERR method in that it compares the differences between the difference in before and after behaviour in two groups.53 54 55
The key assumption behind the DID method, similar to PERR, is that the distribution of the unobserved confounding variables in the treated group and the comparison group and the effect of these unobserved confounding variables on the outcome remains the same before and during the study period. The DID method is also used commonly in psychology, where it is called the before and after design with an untreated comparison group.56 57
Death was significantly higher in one of our database studies (Syst-Eur) and it seemed to be significantly lower in both of the database comparisons with the WHI randomised controlled trial; however, for the reasons enumerated these latter results should be interpreted cautiously.
Future perspective and study limitations
Thus it seems from our studies that an electronic medical record database can be an important tool for ascertaining evidence based decisions with regard to treatment. To maximise the value of future databases they should be designed with all the advantages enumerated for GPRD and also should overcome its limitations (see box). Ideally future databases should be much larger than GPRD, which includes about eight million patients. On the basis of our work to date, we estimate that 40-50 million patients are needed for the breadth of future studies we can envisage.
Studies using such databases would not replace the need to do randomised controlled trials but could serve as an important tool to supplement the contributions of trials to evidence based medicine. One example among many is to generalise the results of randomised controlled trials. Although we have not comprehensively examined this issue, our studies have shown the feasibility of further generalising the results of the Syst-Eur and WHI randomised controlled trials.25 58 59
As well as the need for further validation of the PERR technique, several other limitations apply to this investigative effort. The PERR technique should be viewed currently as applicable only to analysis of a study using a design similar to ours, which includes similar inclusion and exclusion criteria for the exposed and unexposed and a defined study start, recruitment interval, and end time. Furthermore, the random matching technique might be critical to assure that bias does not exist in the start time for unexposed patients. Application of the PERR technique to other study designs will require its validation under those conditions.
Another potential shortcoming of our studies is the inability to exactly replicate all aspects of the randomised controlled trial independent of randomisation, such as exact dose of study drug, the role of placebos, the possibilities of differences in health care, and other differences between participants entered in randomised controlled trials and those in the general population. In addition, there is also the possibility of inaccuracy of information in the database (for instance, misclassification of outcome, ascertainment bias, etc). The reasonably similar results of the database studies and comparative randomised controlled trials, however, suggest these were not major problems.
Our current view is that the PERR analysis should not be performed in isolation. We would recommend its use along with conventional biostatistical analyses. When the conventional and PERR analyses are similar, “unmeasured confounding” would seem unlikely; whereas when they differ “unmeasured confounding” would seem likely. When unmeasured confounding seems to be present, the PERR analysis seems to yield a more valid result, but additional evaluation is required to ascertain the veracity of this suggestion.
What is already known on this topic
- Two major potential problems could impede the capability of an electronic medical record database to provide reliable information concerning drug efficacy: the quality of the data contained within the database and the ability of analyses of observational—that is, non-experimental—data to provide valid results
- The quality of evidence from observational studies is less than from randomised controlled trials because of confounding by indication and other biases related to the effects of unmeasured covariates
What this study adds
- Although observational studies are subject to unmeasured confounding, a new analytical technique, prior event rate ratio (PERR) adjustment, can identify and reduce unmeasured confounding
- Data from properly constructed electronic medical record databases, when analysed with standard statistical methods along with the PERR method, can reveal important insights into the efficacy of medical treatment