Within a large observational data set, the estimated association of invasive cardiac treatment with long-term mortality is sensitive to the analytic method used. Cardiac catheterization predicted a 50% relative decrease in mortality using standard risk-adjustment methods, including a rigorous propensity-based matching analysis, even after accounting for a clinically rich set of prognostic variables. Using instrumental variable methods, the associated relative decrease in mortality was approximately 16%. When estimated treatment associations vary 3-fold depending on the method used, several questions should come to mind.
Do the results have face validity? The survival benefits of routine invasive care from RCTs are between 8% and 21%.4,5
Results in RCTs are optimized and tend to overestimate the relative benefits achievable in routine clinical practice, given the technological expertise and rapid onset of therapy required to produce optimal results. The overestimate of benefit using standard modeling is likely due to residual confounding related to the selection of lower-risk patients for cardiac catheterization.1,2,6
The magnitude of bias may be greater than usual because receiving catheterization required surviving from admission until this treatment. Even controlling for complete information on patients’ admission severity could not eliminate this important survival bias. Such situations are not unusual in observational studies of surgical procedures.
The instrumental variable estimate of a 16% relative survival benefit was closer to RCT results because we used a strong, valid instrumental variable. Although there may be residual unmeasured regional illness differences, this is unlikely since predicted mortality was estimated using strongly prognostic risk factors and was similar for measured covariates across regions. Our instrumental variable predicted a wide range of cardiac catheterization rates (29%–82%). By contrast, McClellan et al9
reported smaller nonsignificant cardiac catheterization effects and larger SEs using an instrumental variable with a smaller range of regional cardiac catheterization rates (15%–27%). Instruments that are more predictive of treatment produce less biased estimates and smaller SEs, and provide closer approximations to the average population effects from RCTs.29,30
When are standard statistical methods likely to produce unbiased findings? The distribution of unmeasured prognostic factors are more likely to be similar when considering therapies with similar clinical indications and risk, such as typical vs atypical neuroleptics for schizophrenia,31–32
or rofecoxib vs celecoxib cyclooxygenase 2 (COX-2) inhibitors for arthritis.33
Randomized clinical trials and observational studies show the greatest similarities under such conditions.34–35
Observational studies of invasive procedures are more prone to bias because patients who are candidates for surgery often differ in unmeasurable ways from patients who are not. A study using propensity-based matching assessed the effects of in-hospital cardiac catheterization using Cooperative Cardiovascular Project data and found smaller long-term relative mortality rates (0.66–0.75)36
; however, classifying patients who received cardiac catheterization after discharge and before 30 days as untreated likely attenuated the effects of cardiac catheterization compared with our study.
Which unmeasured factors might account for selection bias reflective of patient prognosis and physician decision-making behaviors? High-risk cardiac markers, such as dynamic or evolving ST- and T-wave changes, may appear during the hospital stay and require serial electrocardiographic interpretations that are rarely captured in observational studies. Relative contraindications, such as renal insufficiency or previous stroke, rarely conform to dichotomous decisions. Severity of comorbidities is difficult to capture. Referral selection may depend on interactions between comorbidities; for example, patients with concomitant aortic valve disease are more likely to be referred for cardiac catheterization but less so, as renal function progressively declines. Some prognosis factors, such as functional status or transient ischemic attack from previous cardiac catheterization, are not available in usual observational data sets. Social factors, such as employment, language barriers, and patient preferences, are rarely measured in these data. The factors comprising angiography decision making are thus complex, prognostically important, and often unmeasurable.
Is the similarity between multivariable and propensity model estimates expected? Mathematically, controlling for propensity score should produce similar results to model-based risk adjustment, because both control for the same measured covariates.37,38
The utility of instrumental variable analyses depends on finding a strong, valid instrumental variable and careful interpretation.25,26
The instrumental variable estimate measures the treatment effect on the “marginal” population. This excludes those patients who would “always” or “never” receive cardiac catheterization, focusing on patients with uncertain indications whose likelihood of being treated depends on local clinical judgment and catheterization laboratory supply.6,26
The treatment effect must be interpreted as potentially due to the instrument itself, as well as characteristics of care systems associated with the instrument. Along with providing more revascularization and less evidence-based medical treatment, high cardiac catheterization rate regions had more high-volume hospitals with specialized staff and equipment, and coronary care units.6,9
Finally, low cardiac catheterization rate regions did not preferentially select high-risk patients who were more likely to benefit from revascularization, ruling out better clinical decision making as an explanation of the smaller marginal survival effects from instrumental variable analyses.6,39,40
When are nontraditional approaches useful? Instrumental variable analyses are most, suited to inform policy decisions.26
Because region or physician is often the level at which policy and resource allocation decisions are made, such studies assess the effects of health system factors on patient outcomes. These studies answer policy-relevant questions, such as “What are the benefits of increasing the regional cardiac catheterization laboratory capacity?”, because this would increase the routine provision of invasive services to the AMI population. Other studies have used such designs to evaluate the effects of health care spending,11,41
cardiac management: strategies,6
and physician supply42
on patient outcomes. They do not necessarily address questions of clinical effectiveness, such as “What is the effect of providing invasive cardiac treatment to a specific patient?”
Randomized clinical trials cannot be undertaken in all situations in which evidence is needed to guide care. Well-designed observational studies are still needed to assess population effectiveness and to extend results to a general population setting. Our study serves as a cautionary note regarding their analysis and interpretation. First, propensity scores and propensity-based matching have the same limitations as multivariable risk adjustment model methods, arid are no more likely to remove bias due to unmeasured confounding when strong selection bias exists. Second, instrumental variable analyses may remove both overt and hidden biases but are more suited to answer policy questions than to provide insight into a specific clinical question for a specific patient. Caution is advised regarding clinical protocols and policy statements for invasive care based on expected mortality benefits derived from traditional multivariable modeling and propensity score risk adjustment of observational studies.