In this study, we evaluated statistical methods for addressing observed and unobserved confounding in treatment of early-stage prostate cancer patients and prostate cancer–specific and all-cause mortality in observational data. We compared our findings with those from a clinical trial that provided the benchmark results that patients aged 65 years or older receiving radical prostatectomy or conservative management had similar prostate cancer–specific mortality (relative risk [RR] = 0.87, 95% CI = 0.51 to 1.49) and all-cause mortality (RR = 1.04, 95% CI = 0.77 to 1.40) (28
). Contrary to these benchmark results, both multivariable regression analysis and the propensity score reweighting methods produced very similar implications (ie, that aggressive treatment by radical prostatectomy was associated with statistically significant better survival than conservative management). Consistency between propensity score reweighting and traditional multivariable analysis is not uncommon (45
The instrumental variable results, which accounted for unobserved confounding, were more similar to results from the benchmark randomized controlled trial than to results from the unadjusted multivariable regression and propensity score reweighted analyses. Alternative specifications of the instrumental variable produced different point estimates of the hazard rate, although all were found to be non-statistically significant. Findings from this study suggest that the instrumental variable approach may be useful in comparative effectiveness studies of observational databases of other treatments for other diseases. In particular, the lagged area treatment variable that we constructed from the SEER-Medicare data may be a good instrument in studies of other cancer treatments.
Previous studies (21
) have used both propensity score and instrumental variable analyses with the same data to assess treatment outcomes. Earle et al. (21
) examined the effect of chemotherapy on survival in elderly patients with stage IV non–small cell lung cancer and compared these results with those of a randomized controlled trial. In that study, both the instrumental variable and propensity score analyses produced results that were similar to those of the randomized controlled trial, although median follow-up was much shorter for this acute disease, and there were stronger associations between observable clinical characteristics and survival among patients with lung cancer than among those with prostate cancer. Other studies have investigated the effects of invasive cardiac management on acute myocardial infarction survival (39
) and adherence to two oral antidiabetic drug therapies that differ in patient tolerance, adverse events, and side effects (47
). The acute myocardial infarction survival study compared multivariable regression, two propensity score methods, and instrumental variable analysis with randomized controlled trial findings and reported similar findings from the multivariable and two propensity score methods. However, the instrumental variable findings were comparable to the results from randomized controlled trials, indicating that there was selection bias that was caused by unobservable confounders that could not be adjusted by propensity score analysis. In the drug adherence study, multivariable, propensity score, and instrumental variable findings were similar, indicating that any selection bias was caused by observable factors because all three methods of adjusting for confounding produced similar results.
This study and previous studies (21
) indicated that if unobservable factors were not a major source of bias, then instrumental variable and propensity score methods should provide similar results. Whether the instrumental variable and propensity score results support or contradict unadjusted results depends on the extent of selection bias in assigning patients to alternative treatments. These differences across studies also suggest that it may not be possible to generalize about the choice of a statistical method across different clinical conditions. Instrumental variable analysis in principle is the more robust approach because it adjusts for both observable and unobservable potential sources of bias. However, this outcome depends critically on the identification of a valid and plausible instrument, which is controversial because there is no definitive test of the instrument's lack of association with the health outcome and, if the instrument is not strongly associated with the treatment received, the estimate of the treatment effect will be highly imprecise. Thus, it is difficult to distinguish between a true lack of statistical significance between treatment outcomes and an imprecise statistical estimate from a weak instrument. The differences in the estimated hazard rates between the two instrumental variable models that we used illustrate this concern.
One advantage of using SEER-Medicare data for comparative effectiveness studies of alternative cancer treatments is that the lagged treatment pattern in the local geographic area, which we used in this study, is a potentially readily available choice for an instrumental variable, as long as there is sufficient variation across small geographic areas and there are enough patients in each treatment group to generate reasonably stable local area estimates. Similar treatment propensity measures can be constructed for other cancers. An important innovation in this study was that the instrumental variable was defined as the difference between the actual and predicted treatment proportions in the geographic area because the underlying characteristics of patients are not likely to be similar across geographic areas.
Findings from this study have important ramifications for physicians who rely on the medical literature to counsel newly diagnosed patients with localized prostate cancer regarding treatment and also patients who learn of newly published findings on the comparative effectiveness of various prostate cancer treatments in the popular press. Given the difficulties in successfully conducting randomized trials of prostate cancer treatments, observational data may form the preponderance of evidence that treating physicians will rely on to guide their discussions with patients. Many practicing physicians may not have the time or expertise to evaluate the biases inherent in observational reports published in academic journals. Thus, when observational data analyses are published without the appropriate methodology to account for observed and unobserved sources of bias, treating physicians may ascribe inappropriate validity to their findings when advising patients about treatment choice.
A recent study (13
) and accompanying editorial (2
) underscore the very real nature of this problem. Using a propensity score methodology, Wong et al. (13
) found, as we did in this analysis, that aggressive management (either surgery or radiation) was associated with better survival than conservative management among older patients with prostate cancer. The accompanying editorial (2
) noted that the findings of the study seemed counter to clinical intuition and that perhaps there was inadequate risk adjustment. Despite this concern, the article received substantial coverage in the lay press (48
Sensitivity analyses (50
) can reassure clinicians that the results are robust to alternative assumptions about the presence of a hypothetical confounder. For example, Wong et al. (13
) showed that relative to their primary result that active treatment was associated with statistically significant better survival than conservative management (HR for mortality = 0.69, 95% CI = 0.66 to 0.72), the effect of an omitted confounder would have to be large to generate a result of no difference in mortality. However, such a finding does not mean that there were no unobserved confounders or that actual treatment decisions might not be influenced by multiple unobserved factors, which alone might make a small contribution but in combination might influence treatment decisions in a systematic way. Moreover, our analysis indicated that unobserved confounding may in fact be large because we found that the propensity score–adjusted survival (HR range = 1.46–1.56) was higher than the instrumental variable estimate (HR = 1.09, 95% CI = 0.46 to 2.59) and the estimate from the randomized controlled trial (RR = 1.04, 95% CI = 0.77 to 1.40) ().
For the sake of patients and health-care providers who use study results to make life-changing decisions, researchers need to use multiple methods of risk adjustment, such as propensity score reweighting and instrumental variable analysis, to confirm that the results are not sensitive to the method of risk adjustment. If differences are noted, the clinical plausibility and statistical validity of the various approaches should be reexamined and results should be considered with appropriate caution. Patient selection into specific treatments on the basis of factors related to prognosis is an important consideration in all observational studies, but particularly in studies involving cancer in which the incidence is highest in the elderly who are also most likely to have multiple comorbidities.
As we have noted, instrumental variable analysis does not guarantee that all observational data bias is eliminated. The variable(s) selected as the instrument should have a statistically significant association ith treatment choice but not with the health outcome or with unobserved factors that influence the health outcome. Although there are guidelines for assessing whether an instrument is a strong predictor of the treatment received, there is no definitive test for an instrument's validity. If the instrument is weak, the extent of bias in the instrumental variable estimate may be greater than in the unadjusted observational data. For example, our alternative instrumental variable analysis that used both the lagged area treatment and measures of local area medical resources, which were not strongly related to the treatment received, resulted in much better all-cause survival (HR = 0.71, 95% CI = 0.31 to 1.59). Although this estimate had a large confidence interval and the hypothesis of no difference in survival could not be rejected, its low point value could lead some to conclude that conservative management was associated with better survival than radical prostatectomy in this population of men between the ages of 66 and 74 years. Although use of multiple instruments may increase the ability to explain the treatment received, it can also increase the likelihood of an association between the instrumental variable and the health outcome. It is also important to recognize that the result of the instrumental variable analysis is limited primarily to the population on the treatment margin (ie, men who do not have strong indications that would favor one treatment approach over the other). Given these caveats, however, if a conceptually plausible instrument that has a strong and statistically significant association with the treatment can be found, instrumental variable analysis should provide a potentially important alternative and complementary methodology to propensity score methods for assessing treatment outcomes without having to make any a priori assumptions about the potential magnitude of unobserved confounders.
There were several limitations in our study. First, the benchmark Scandinavian randomized controlled trial (28
) that we used to assess the alternative statistical methods of adjusting for observational data bias is only one study and is not representative of all elderly prostate patients in the United States. It was also limited to the comparison of radical prostatectomy and conservative management, excluding men who were treated by radiation therapy. Although we selected our study sample of patients from the SEER database to be as similar as possible to patients in the benchmark clinical trial, we were not able to include prostate cancer patients who were younger than 66 years, a group that represented approximately 46% of the trial population (28
). The effectiveness of treatment varied in the two age groups (ie, <65 and ≥65 years), and we compared our findings with the population aged 65 years or older. Consequently, our analysis should be viewed primarily from a methodological perspective rather than as an analysis with direct implications for clinical treatment. Second, our sample was restricted to the approximately 85% of Medicare enrollees in fee-for-service plans. Prostate cancer stage at diagnosis has been reported to be similar in Medicare managed care and fee-for-service settings; however, among patients with clinically localized disease, treatment varies by setting (8
). Patients with early-stage prostate cancer in managed care were less likely to receive radical prostatectomy and more likely to receive radiation or conservative management than similar patients in fee-for-service settings (8
). Third, enrollment in Medicare managed care changed during the study period, which may also limit the generalizability of our findings to the managed care population. Fourth, we did not have information about prostate-specific antigen screening before diagnosis. Use of prostate-specific antigen screening increased dramatically over the period of our study (52
), and the number of men diagnosed with early-stage prostate cancer increased accordingly (55
). Although we included year of diagnosis in our models, we could not identify which prostate cancer patients were diagnosed because of elevated prostate-specific antigen levels and which were diagnosed because of symptoms. Lastly, a complete statistical assessment of the Cox hazard model's proportionality assumption indicated that the effects of some covariates may not be time invariant, especially in the analysis of all-cause mortality. Although a sensitivity analysis of the effects of allowing time-varying covariates did not alter the principle findings with regard to treatment effects, further analysis of time-varying effects may be warranted.
In summary, survival after radical prostatectomy or conservative management in elderly patients with early-stage prostate cancer as calculated by instrumental variable estimation of Cox proportional hazard models from observational data was similar to that calculated by Cox proportional hazard models from clinical trial data. Consequently, instrumental variable analysis may be a useful technique in comparative effectiveness studies of prostate cancer and other cancer treatments if an acceptable instrument can be identified. Future research is warranted to evaluate additional methods of addressing confounding in observational data and to compare results from these methods with those from benchmark randomized controlled trials.
National Cancer Institute (HHSN2612007003 39P).