|Home | About | Journals | Submit | Contact Us | Français|
Using observational data to assess the relative effectiveness of alternative cancer treatments is limited by patient selection into treatment, which often biases interpretation of outcomes. We evaluated methods for addressing confounding in treatment and survival of patients with early-stage prostate cancer in observational data and compared findings with those from a benchmark randomized clinical trial.
We selected 14302 early-stage prostate cancer patients who were aged 66–74 years and had been treated with radical prostatectomy or conservative management from linked Surveillance, Epidemiology, and End Results–Medicare data from January 1, 1995, through December 31, 2003. Eligibility criteria were similar to those from a clinical trial used to benchmark our analyses. Survival was measured through December 31, 2007, by use of Cox proportional hazards models. We compared results from the benchmark trial with results from models with observational data by use of traditional multivariable survival analysis, propensity score adjustment, and instrumental variable analysis.
Prostate cancer patients receiving conservative management were more likely to be older, nonwhite, and single and to have more advanced disease than patients receiving radical prostatectomy. In a multivariable survival analysis, conservative management was associated with greater risk of prostate cancer–specific mortality (hazard ratio [HR] = 1.59, 95% confidence interval [CI] = 1.27 to 2.00) and all-cause mortality (HR = 1.47, 95% CI = 1.35 to 1.59) than radical prostatectomy. Propensity score adjustments resulted in similar patient characteristics across treatment groups, although survival results were similar to traditional multivariable survival analyses. Results for the same comparison from the instrumental variable approach, which theoretically equalizes both observed and unobserved patient characteristics across treatment groups, differed from the traditional multivariable and propensity score results but were consistent with findings from the subset of elderly patient with early-stage disease in the trial (ie, conservative management vs radical prostatectomy: for prostate cancer–specific mortality, HR = 0.73, 95% CI = 0.08 to 6.73; for all-cause mortality, HR = 1.09, 95% CI = 0.46 to 2.59).
Instrumental variable analysis may be a useful technique in comparative effectiveness studies of cancer treatments if an acceptable instrument can be identified.
Although randomized controlled trials provide the best assessment of alternative treatments, such trials are costly, time consuming, and may have limited generalizability. Observational data might be an alternative, but these data can be limited by confounding.
Data from early-stage, elderly prostate cancer patients who had been treated with radical prostatectomy or conservative management were from the linked Surveillance, Epidemiology, and End Results–Medicare database. Observational data were examined by use of traditional multivariable survival analysis, propensity score adjustment, and instrumental variable analysis. Results were compared with those from a benchmark randomized trial.
Propensity score adjustments resulted in similar patient characteristics across treatment groups, and survival was similar to that of traditional multivariable survival analyses. The instrumental variable approach, which theoretically equalizes both observed and unobserved patient characteristics across treatment groups, differed from multivariable and propensity score results but were consistent with findings from a subset of elderly patient with early-stage disease in the randomized trial.
Instrumental variable analysis may be a useful technique in comparative effectiveness studies of cancer treatments.
The study population was restricted to Medicare enrollees in fee-for-service plans. Data from only one randomized trial were used, and its study sub-population of elderly prostate patients may have been under-powered.
From the Editors
Comparing the effectiveness of alternative cancer treatments is a critical objective for medical and health services researchers. Practicing physicians need to have valid information about the risks and benefits of alternative treatments to discuss options with their patients and make treatment recommendations. Moreover, in a climate of rapidly growing health-care costs and constrained national resources, health-care policymakers need comparative effectiveness information to make decisions about reimbursement rates and insurance coverage.
The randomized controlled trial is considered the most valid methodology for assessing treatments’ efficacy. However, randomized controlled trials are costly, time consuming, and frequently not feasible because of ethical constraints. Moreover, some randomized controlled trial results have limited generalizability because of differences between randomized controlled trial study populations, who may be screened for eligibility on the basis of age and comorbidities, and community populations, who are likely to be much more heterogeneous with regard to health conditions and socioeconomic characteristics.
Given the need for comparative effectiveness information and the limitations of randomized controlled trials, investigating the feasibility of using observational data from actual medical practice in comparative effectiveness studies as a complement to randomized controlled trials is important. However, observational studies are subject to bias that are caused by selection of patients into treatments for reasons related to expected survival (eg, patients with a better prognosis may be more likely to receive one treatment over another) and the inability to observe all relevant information (1–4). Patient selection into specific treatments is an important consideration in all observational studies, but particularly for those in prostate cancer, because incidence is highest in the elderly who are also most likely to have multiple comorbidities.
The number of published studies that used observational data to assess the effectiveness of cancer treatment has increased dramatically in the past decade (5–9) and is likely to increase even more rapidly with the growing emphasis on comparative effectiveness research (10). In addition to traditional multivariable regression analyses, researchers have used propensity score analysis to adjust for differences in observed patient and physician characteristics. Observational studies (1,11–13) have previously used traditional regression and propensity score methods to evaluate associations between specific prostate cancer treatments with survival. In these studies, the propensity score methods did not completely balance (ie, equalize) important patient characteristics such as tumor grade, size, and comorbidities across treatment groups. Furthermore, patients who received active treatment had better survival for noncancer causes of death than patients who received conservative management, indicating that unobserved differences between groups affected both treatment choice and survival.
Instrumental variable analysis is a statistical technique that uses an exogenous variable (or variables), referred to as an “instrument,” that is hypothesized to affect treatment choice but not to be related to the health outcome (14–17). Variations in treatment that result from variations in the value of the instrument are considered to be analogous to variations that result from randomization and so address both observed and unobserved confounding. Instrumental variable analysis has been used with observational data to investigate clinical treatment effects among patients with breast cancer (18–20), lung cancer (21), or prostate cancer (5,22).
Early-stage prostate cancer is an ideal disease for comparing analytic techniques for addressing observed and unobserved confounding because it is a common cancer among men with uncertainty as to which treatment is optimal (23). Geographic variation in surgical treatment has been consistently reported in the United States (7,24,25), and patient selection into different treatments on the basis of their ages and/or comorbid conditions is particularly a concern in evaluations of prostate cancer treatments (24–27). In this study, we evaluated traditional multivariable regression, propensity score, and instrumental variable analyses for addressing observed and unobserved confounding among patients with early-stage prostate cancer who were treated with radical prostatectomy or conservative management. We also compared findings from these analyses with those from a benchmark randomized clinical trial that evaluated the same two treatments (28,29).
This study evaluated three statistical techniques (ie, traditional multivariable regression analysis, propensity score analysis, and instrumental variable analysis) for assessing survival among early-stage prostate cancer patients who received either radical prostatectomy or conservative management within 6 months of diagnosis. We used the linked Surveillance, Epidemiology, and End Results (SEER)–Medicare data to identify patients with early-stage prostate cancer, to measure treatment, and to assess prostate cancer–specific and all-cause mortality (30). Findings were compared with those from a benchmark randomized controlled trial that was conducted in Scandinavia (28,29). This Scandinavian trial compared radical prostatectomy with conservative management among newly diagnosed patients with early-stage prostate cancer. To the extent possible, we selected patients by use of the same eligibility criteria as the clinical trial.
We used data from the SEER program maintained by the National Cancer Institute that was linked to Medicare claims data. The SEER registries (which include the metropolitan areas San Francisco–Oakland, Detroit, Seattle, Atlanta, San Jose–Monterey, and Los Angeles county, and the states of Connecticut, Hawaii, Iowa, Kentucky, Louisiana, New Mexico, New Jersey, Utah, and the remainder of California) represent approximately 26% of the US population (1). For each person diagnosed within these defined geographic catchment areas, the SEER registries collect information on every occurrence of a primary incident cancer, the month and year of diagnosis, cancer site, stage, histology, initial treatment, and vital status including cause of death for patients who died.
Cancer patients reported to SEER from January 1, 1973, through December 31, 2005, have been matched against Medicare's master enrollment file, and Medicare claims have been extracted for those with fee-for-service coverage. Among patients aged 65 years or older with a cancer diagnosis recorded in the SEER data, 94% have been linked with Medicare enrollment data (2). A more detailed description of the linked SEER-Medicare data is available at http://healthservices.cancer.gov/seermedicare/.
Patient demographic characteristics and vital status were obtained from Medicare enrollment data. Information about inpatient and outpatient care, specifically surgery, radiation therapy, injectable hormone treatment, and chemotherapy was obtained from SEER data and Medicare claims. The inpatient (Medicare Provider Analysis and Review), Hospital Outpatient, and Carrier Medicare claims files were used in this study.
We used the eligibility criteria in the randomized controlled trial that compared radical prostatectomy and conservative management (28,29) to select the study population, including newly diagnosed and previously untreated patients with prostate cancer who were younger than 75 years and whose tumor stage was T1 or T2 (28,29). We selected newly diagnosed early-stage prostate cancer (International Classification of Diseases for Oncology [ICD-O code C61.9]) patients aged 66–74 years in linked SEER-Medicare data from January 1, 1995, through December 31, 2003. Survival was observed for up to 12 years, through December 31, 2007. The median survival time from date of diagnosis to December 31, 2007, was 78 months (interquartile range = 48 months). We used diagnosis and procedure codes from the inpatient and outpatient Medicare claims in the year before diagnosis to classify patient comorbid status that was based on a series of condition indicators and condition-specific weights by use of the NCI combined comorbidity index (26).
We defined radical prostatectomy within 6 months of diagnosis from SEER surgery codes and International Classification of Diseases, Ninth Edition (ICD-9), and Current Procedural Terminology, Fourth Edition (CPT-4), codes from the Medicare claims (Appendix Table 1). Conservative management was defined as no radiation, surgery, hormonal treatment, or chemotherapy within 6 months of diagnosis. Among the 110857 newly diagnosed elderly patients with prostate cancer who had fee-for-service coverage, patients were excluded for the following reasons: unusual histology (n = 2149), identified as having cancer through a death certificate or autopsy (n = 291), not from a SEER registry (n = 283), month of diagnosis or date of death unknown (n = 977), aged 65 years and no data for previous year (n = 10806), incomplete Medicare Part A and Part B data because of managed care enrollment or only Part A enrollment for 1 year before or after diagnosis (n = 39417), distant stage or not clinical stage T1 or T2 disease (n = 21512), and treatment with chemotherapy, radiation therapy, or hormone therapy but without surgery (n = 17607). The remaining 17815 patients were used to construct the propensity scores and the primary instrumental variable (ie, the lagged [previous year’s], local area, adjusted probability of receiving conservative management). The final sample of 14302 patients for the estimation of survival models resulted from eliminating patients in geographic areas with fewer than 50 patients over the entire observation period (n = 561) and using a lagged value of the primary instrumental variable (n = 2952).
The primary health outcome measures are the number of months of survival from diagnosis to death or the end of the observation period. We measured both death from prostate cancer from SEER and death from any cause from Medicare claims.
The key assumption of a randomized controlled trial is that random assignment to treatment groups effectively holds constant the effects of all observed and unobserved patient characteristics on health. In this analysis, we investigated three analytic alternatives to a randomized controlled trial design that used observational data and statistical methods, rather than randomization, to hold constant the effects of factors other than the treatments that might affect health at the end of the observation period.
Multivariable regression analysis is the conventional analytic approach for comparing groups. It holds constant the effects of observable factors by including them as covariates in the regression model, but its key assumption is that unobserved factors that affect health are not associated with the treatment received. If this assumption is violated, then the magnitude and possibly the direction of the estimated treatment effect will be biased. The clearest examples of this type of bias occur when there are systematic differences in unobserved health between patients receiving different treatments (1).
Propensity score analysis addresses the potential problem that some patient characteristics may vary systematically and substantially across treatment groups so that the other independent variables cannot adequately control for the effects of nonoverlapping characteristics and thus leads to a biased estimate of the effect of the treatment on health. Propensity score analysis also implicitly makes the fundamental assumption that balancing (ie, equalizing) the observable patient characteristics across treatment groups minimizes the potential bias from unobservable factors (31–33).
In practice, there are several approaches to balancing characteristics across treatment groups that typically begin by estimating a logistic regression model to calculate a propensity score [ie, the probability of receiving the particular treatment as a function of all measured factors (ie, confounders) that might affect the treatment outcome (34)]. Factors that arguably influence only treatment choice, but not outcome, should not be included in estimating the propensity score.
After each patient is assigned a propensity score, there are three general strategies for “balancing” patients’ characteristics across treatment groups (35): 1) grouping, which subdivides patients into homogeneous subgroups on the basis of the propensity score; 2) matching, which essentially pairs patients with identical or nearly identical propensity scores across treatment arms; and 3) weighting, which assigns patients differential weights on the basis of their propensity score. In each of these approaches, the primary goal is to develop samples of treated and untreated patients who are as similar as possible and arguably mimic samples that would be created by randomization.
Previous studies (34,35) indicate that the weighting approach is the most general and most efficient because it uses all available data and does not require any arbitrary decisions with regard to grouping or matching. Therefore, in this analysis, we proceeded by estimating the propensity score (ps) from a logistic regression model of the probability of conservative management relative to aggressive (surgical) treatment as a function of clinical and demographic characteristics. We then compared two weighting approaches that make different assumptions about the distribution of propensities between the treated and untreated (or control) populations (34). 1) Specifically, the inverse probability of treatment weighting assigns weights of 1/ps for patients receiving conservative management and 1/(1 − ps) for patients receiving radical prostatectomy. This approach assumes that the two patient populations are reasonably similar and that the treatment could be applied to the entire study population. 2) Standardized mortality ratio weighting assigns a weight of 1 for treated patients (conservative management) and a weight of [ps/(1 − ps)] for untreated patients (radical prostatectomy). Under this assumption, the estimated treatment effect applies to the subpopulation that has characteristics similar to the treated population. This approach is more appropriate when the study populations in the two treatment groups are very different.
Instrumental variable analysis addresses a potential limitation of traditional regression analysis and propensity score analysis by seeking to adjust for the effects of both observable and unobservable characteristics. The key challenge of instrumental variable analysis is identifying at least one factor that statistically significantly affects treatment choice but is not related to the health outcome. Variations in treatment that result from this exogenous factor, referred to as an instrument, can be regarded as similar to randomization because random assignment is itself essentially an exogenous instrument in that it affects treatment but is unrelated to the subsequent effect on health. Like treatment groups created by randomization, patients who receive different treatments because of variation in the value of the instrument should have similar observable and unobservable characteristics.
In general, there are two assumptions and conditions for using instrumental variable analysis (36,37). First, the instrument should have a statistically significant impact on the treatment (ie, should be a statistically significant cause of treatment variation). In practice, this condition has been translated into a rule of thumb that the test for the statistical significance of the instrument(s) should have an F statistic value of at least 10 and that the instrument should account for a meaningful share of the observed variation in the treatment, because, if the instrument does not explain very much of the variation in the treatment, then predicted treatment will not differ very much across patient populations.
Second, the instrument should not be associated with the health outcome or with unobserved factors that might influence the health outcome. However, there is no definitive statistical test that proves an instrument's validity. In effect, any potential instrument also needs to satisfy a plausibility condition (ie, a logical and convincing argument that justifies the instrument as a factor that statistically significantly influences the treatment received but is not associated with either the patient's current health or the treatment outcome).
Applying instrumental variable analysis requires a two-stage estimation approach. The first-stage equation predicts the likelihood of receiving the treatment as a function of the instrument and other exogenous factors, and the second-stage equation estimates the effect of the treatment on the health outcome incorporating the “instrumental variable” generated from the first-stage equation. If the health outcome model is nonlinear (eg, a logistic or hazard model), as in this analysis, then the appropriate instrumental variable procedure is the two-stage residual inclusion method (38), which adds the residual (ie, the difference between the actual value of the treatment choice variable and the predicted value) from the first-stage equation as an additional variable in the second-stage equation. In principle, only one instrument is needed to implement instrumental variable analysis for a single comparison of two treatment choices. However, there may be several potential instruments, and investigators should assess the relative strengths and robustness of estimates generated from alternative individual or combinations of instruments.
Although not a definitive classification scheme, past instrumental variable analyses (5,18,19,21,39–42) have tended to select instruments of the following types: 1) the frequency of a particular treatment in a geographic area, sometimes referred to as a local area treatment pattern or treatment signature; 2) the treatment pattern of the patient's provider (ie, physician, hospital, clinic) that is based on the pattern of care received by other patients with the same condition treated by that provider; 3) the distance to or availability of a key type of medical resource that is strongly associated with the treatment of interest; 4) the economic cost to the provider and/or the patient of alternative treatments; or 5) natural “experiments” that occur because of changes in policy or institutional structure that are independent of individual patients’ health.
We selected the lagged (ie, previous year’s) local area treatment pattern for conservative management as the primary instrumental variable for three reasons. First, it varies substantially across geographic areas (Supplementary Table 1, available online) and has a highly statistically significant impact on the actual treatment received (Supplementary Table 2, available online). Second, it satisfies the key plausibility criterion of being independent of a current patient's health and other characteristics because it reflects provider treatment decisions from a previous time period (ie, the year before the patient was diagnosed). Third, it can be constructed from readily available data.
The instrument was created by grouping eligible patients from the SEER-Medicare database into hospital referral regions as developed by the Dartmouth Atlas of Health Care (43). A hospital referral region is the set of contiguous zip codes around a major hospital (defined as a hospital that performs cardiovascular surgery and neurosurgery) from which the hospital draws substantial proportions of patients admitted for major cardiovascular surgery or neurosurgery. We defined geographic areas as hospital referral regions because of probable intra-area heterogeneity in treatment patterns within the several geographically large SEER registry areas and small sample concerns that are associated with geographically smaller health-care service areas (ie, 561 patients were deleted after applying the constraint that a hospital referral region have at least 50 patients during the entire study period and 19 adjoining or nearby hospital referral regions were combined to satisfy this condition) (see Supplementary Table 1, available online for a list of the hospital referral regions and their sample sizes).
We constructed the primary instrumental variable for treatment received by use of a two-step process. First, we used the entire dataset (n = 17815) to estimate the probability of receiving conservative management as a function of patients’ clinical characteristics (tumor stage and grade, NCI comorbidity index, and Medicare reimbursements for medical care in the previous year), demographics (age, race, ethnicity, and marital status), year of diagnosis, and all possible interactions among these variables. Second, we calculated the difference between the actual proportion of patients receiving conservative management and the average predicted probability of receiving conservative management (generated from the logistic regression model) in each hospital referral region by year. Areas with relatively large positive differences between the actual and predicted proportions of patients receiving conservative management favor a conservative management treatment pattern, and areas with large negative differences between the actual and predicted proportions of patients receiving conservative management favor a radical prostatectomy treatment pattern. We then lagged this measure of the local area treatment pattern by 1 year and linked it to each patient in the analysis to enhance the instrument's independence from patients’ current health and unobserved characteristics.
Additional potential instruments were used to construct a second instrumental variable for sensitivity analysis. These variables measure the availability of medical resources in patients’ counties of residence in 2000: total number of patient care physicians, urologists, radiation oncologists, and hospital beds per 100000 population [from the Area Resource File (44)]. Controlling for the overall availability of physicians, we hypothesized that conservative management will be less likely in areas with more hospital beds and more specialists who tend to concentrate on care of prostate cancer patients. Although these measures are likely not to be associated with patients’ underlying health, their direct association with prostate cancer treatment may be relatively weak because they are not specific to Medicare patients with prostate cancer. Moreover, they are measured at only a single point in time, 2000, during the 14-year observation period in this study.
All statistical models included the following control variables: age (66–69 or 70–74 years), race or ethnicity (white non-Hispanic, white Hispanic, African American, or all other races), marital status (single or married), tumor characteristics (stage and grade), previous health problems [as measured by the NCI combined comorbidity index (26) and Medicare reimbursements in the 12 months before diagnosis], and year of diagnosis. Year of diagnosis captured the combined effects of several factors that were changing over the study's time period, including the increase in prostate-specific antigen testing, movements by Medicare beneficiaries into and out of managed care organizations, and changes in Medicare physician reimbursement. These trends potentially change the nature of the underlying population of elderly fee-for-service Medicare patients who are diagnosed with prostate cancer.
Treatment propensity (ie, the predicted probability of receiving conservative management) for the propensity score analysis and for constructing the lagged area treatment pattern for the instrumental variable analysis was estimated with logistic regression. The survival models were estimated with Cox proportional hazard models. Visual inspection of the parallelism of the Kaplan–Meier plots of the logarithms of the estimated cumulative survival models by treatment supported the proportionality assumption. The instrumental variable version of the Cox hazard model was estimating with the two-stage residual inclusion method (38), which has been shown to be appropriate for nonlinear outcome models. This approach adds a separate variable that measures the residual (ie, the difference between the actual value of the [0,1] dependent variable and the predicted probability generated by the logistic model from the first-stage model for predicting treatment choice as a function of the instrumental variable. All models were estimated with the SAS statistical software (Cary, NC). All statistical tests were two-sided.
Prostate cancer patients receiving conservative management were statistically significantly older, much more likely to be African American, more likely to have more advanced disease and much less likely to be married than patients receiving radical prostatectomy (Table 1). The conservative management group was also much less likely to have any Medicare claims in the year before diagnosis and, as a result, has a much higher proportion with unknown comorbidities. Characteristics of the two treatment groups changed after applying the statistical adjustments designed to correct for potential observational data biases. As expected after propensity score reweighting that used the inverse probability of treatment weights, the weighted characteristics of the two treatment groups were virtually identical. The one characteristic that was not equalized was the lagged difference between the actual and predicted proportions of patients who received conservative management. This difference between treatment groups in actual and predicted proportions of patients who received conservative management indicates that unobserved differences, which influenced the treatment received between the reweighted populations, may persist even after reweighting. (In data not shown, using the standardized mortality ratio weight also balances the characteristics of the two treatment groups by adjusting the radical prostatectomy group to have the same reweighted characteristics as the conservative management group.)
Characteristics of the sample were also grouped by the value of the primary instrument for the instrumental variable approach (ie, the lagged area-wide difference between the actual and predicted proportions of patients who were treated by conservative management). Splitting the sample at the median value of the instrument resulted in a 12.8% difference in the instrument's value between the two groups, which meant that by holding clinical characteristics constant, patients in the above-median group were, on average, 6.4% more likely to receive conservative management, and those in the below-median group were 6.4% less likely to receive conservative management. (The range across individual hospital referral regions was from −13.9% in the region including Owensboro, Paduca, and Nashville, TN, to 19.4% in New Haven, CT.) Similarly, 35% of the patients in the above-median group actually received conservative management compared with 27.1% in the below-median group. Thus, the instrument successfully distinguishes between patients that are more or less likely to receive conservative management for reasons that were independent of their observed health characteristics. Grouping patients by the value of the instruments narrowed, although did not eliminate, several of the differences in the observed characteristics. It equalized the values of characteristics measuring disease stage, the number of comorbidities, and the presence and amount of Medicare claims in the year before diagnosis.
The unadjusted observational data and the propensity score reweighted data clearly indicate a statistically significant survival advantage for both prostate cancer–specific death and death from all causes associated with radical prostatectomy (Table 2). (The longer survival time shown for all-cause mortality reflected a lag in reporting cause-specific mortality in the SEER-Medicare data.) The same comparisons for patients grouped by the value of the instrument narrowed the differences in both months of prostate-specific and all-cause survival and mortality percentages, which are not statistically significantly different from each other. The absence of statistically significant differences in the mortality rates was consistent with the findings from the randomized controlled trial after 12 years of follow-up (29).
As in the comparison of means in Table 2, the hazard rates from the models estimated with the unweighted observational sample and the two propensity score reweighted samples indicated that a large and statistically significant survival advantage was associated with radical prostatectomy (Tables 3–5). In unweighted multivariable survival analysis, conservative management was associated with greater risk of prostate cancer–specific mortality (Tables 3 and and4;4; hazard ratio [HR] = 1.59, 95% confidence interval [CI] = 1.27 to 2.00) and all-cause mortality (Tables 3 and and5;5; HR = 1.47, 95% CI = 1.35 to 1.59) than radical prostatectomy. Hazard rates for conservative management compared with radical prostatectomy by using both propensity score reweighting approaches were similar for prostate cancer–specific and all-cause mortality. In contrast, the hazard rates estimated by instrumental variable analysis did not show a statistically significant survival advantage for radical prostatectomy (Tables 3 and and6;6; for prostate cancer–specific mortality, HR = 0.73, 95% CI = 0.08 to 6.73; for all-cause mortality, HR = 1.09, 95% CI = 0.46 to 2.59). Moreover, the instrumental variable estimates were also very similar to the relative risk rates calculated by the benchmark randomized controlled trial. (Tables 4–6 report the complete models underlying the results summarized in Table 3.)
The strength of the primary instrumental variable was indicated by its statistical significance in the first-stage equation that predicts treatment choice for each case and its lack of statistical significance in the second-stage survival model. It was highly statistically significant in the first-stage model (F = 109.5 and P < .001) and accounted for 4.2% of the explained variation, which, although not as large as one would like, was partially attributed to the fact that the first stage–dependent variable was dichotomous rather than continuous. Its independence of the survival outcomes was confirmed by its lack of statistical significance as an independent variable in an alternative version (data not shown) of the Cox survival models (P = .68 in the all-cause survival model and P = .34 in the prostate-specific survival model). A second set of instrumental variable estimates that used additional area variables to construct the treatment instrument resulted in even smaller hazard rates than reported in Table 3. However, these estimates were not as reliable because of the relatively weaker association between general area variables and prostate cancer treatment choices in older men.
In this study, we evaluated statistical methods for addressing observed and unobserved confounding in treatment of early-stage prostate cancer patients and prostate cancer–specific and all-cause mortality in observational data. We compared our findings with those from a clinical trial that provided the benchmark results that patients aged 65 years or older receiving radical prostatectomy or conservative management had similar prostate cancer–specific mortality (relative risk [RR] = 0.87, 95% CI = 0.51 to 1.49) and all-cause mortality (RR = 1.04, 95% CI = 0.77 to 1.40) (28,29). Contrary to these benchmark results, both multivariable regression analysis and the propensity score reweighting methods produced very similar implications (ie, that aggressive treatment by radical prostatectomy was associated with statistically significant better survival than conservative management). Consistency between propensity score reweighting and traditional multivariable analysis is not uncommon (45,46).
The instrumental variable results, which accounted for unobserved confounding, were more similar to results from the benchmark randomized controlled trial than to results from the unadjusted multivariable regression and propensity score reweighted analyses. Alternative specifications of the instrumental variable produced different point estimates of the hazard rate, although all were found to be non-statistically significant. Findings from this study suggest that the instrumental variable approach may be useful in comparative effectiveness studies of observational databases of other treatments for other diseases. In particular, the lagged area treatment variable that we constructed from the SEER-Medicare data may be a good instrument in studies of other cancer treatments.
Previous studies (21,39,47) have used both propensity score and instrumental variable analyses with the same data to assess treatment outcomes. Earle et al. (21) examined the effect of chemotherapy on survival in elderly patients with stage IV non–small cell lung cancer and compared these results with those of a randomized controlled trial. In that study, both the instrumental variable and propensity score analyses produced results that were similar to those of the randomized controlled trial, although median follow-up was much shorter for this acute disease, and there were stronger associations between observable clinical characteristics and survival among patients with lung cancer than among those with prostate cancer. Other studies have investigated the effects of invasive cardiac management on acute myocardial infarction survival (39) and adherence to two oral antidiabetic drug therapies that differ in patient tolerance, adverse events, and side effects (47). The acute myocardial infarction survival study compared multivariable regression, two propensity score methods, and instrumental variable analysis with randomized controlled trial findings and reported similar findings from the multivariable and two propensity score methods. However, the instrumental variable findings were comparable to the results from randomized controlled trials, indicating that there was selection bias that was caused by unobservable confounders that could not be adjusted by propensity score analysis. In the drug adherence study, multivariable, propensity score, and instrumental variable findings were similar, indicating that any selection bias was caused by observable factors because all three methods of adjusting for confounding produced similar results.
This study and previous studies (21,39,47) indicated that if unobservable factors were not a major source of bias, then instrumental variable and propensity score methods should provide similar results. Whether the instrumental variable and propensity score results support or contradict unadjusted results depends on the extent of selection bias in assigning patients to alternative treatments. These differences across studies also suggest that it may not be possible to generalize about the choice of a statistical method across different clinical conditions. Instrumental variable analysis in principle is the more robust approach because it adjusts for both observable and unobservable potential sources of bias. However, this outcome depends critically on the identification of a valid and plausible instrument, which is controversial because there is no definitive test of the instrument's lack of association with the health outcome and, if the instrument is not strongly associated with the treatment received, the estimate of the treatment effect will be highly imprecise. Thus, it is difficult to distinguish between a true lack of statistical significance between treatment outcomes and an imprecise statistical estimate from a weak instrument. The differences in the estimated hazard rates between the two instrumental variable models that we used illustrate this concern.
One advantage of using SEER-Medicare data for comparative effectiveness studies of alternative cancer treatments is that the lagged treatment pattern in the local geographic area, which we used in this study, is a potentially readily available choice for an instrumental variable, as long as there is sufficient variation across small geographic areas and there are enough patients in each treatment group to generate reasonably stable local area estimates. Similar treatment propensity measures can be constructed for other cancers. An important innovation in this study was that the instrumental variable was defined as the difference between the actual and predicted treatment proportions in the geographic area because the underlying characteristics of patients are not likely to be similar across geographic areas.
Findings from this study have important ramifications for physicians who rely on the medical literature to counsel newly diagnosed patients with localized prostate cancer regarding treatment and also patients who learn of newly published findings on the comparative effectiveness of various prostate cancer treatments in the popular press. Given the difficulties in successfully conducting randomized trials of prostate cancer treatments, observational data may form the preponderance of evidence that treating physicians will rely on to guide their discussions with patients. Many practicing physicians may not have the time or expertise to evaluate the biases inherent in observational reports published in academic journals. Thus, when observational data analyses are published without the appropriate methodology to account for observed and unobserved sources of bias, treating physicians may ascribe inappropriate validity to their findings when advising patients about treatment choice.
A recent study (13) and accompanying editorial (2) underscore the very real nature of this problem. Using a propensity score methodology, Wong et al. (13) found, as we did in this analysis, that aggressive management (either surgery or radiation) was associated with better survival than conservative management among older patients with prostate cancer. The accompanying editorial (2) noted that the findings of the study seemed counter to clinical intuition and that perhaps there was inadequate risk adjustment. Despite this concern, the article received substantial coverage in the lay press (48,49).
Sensitivity analyses (50,51) can reassure clinicians that the results are robust to alternative assumptions about the presence of a hypothetical confounder. For example, Wong et al. (13) showed that relative to their primary result that active treatment was associated with statistically significant better survival than conservative management (HR for mortality = 0.69, 95% CI = 0.66 to 0.72), the effect of an omitted confounder would have to be large to generate a result of no difference in mortality. However, such a finding does not mean that there were no unobserved confounders or that actual treatment decisions might not be influenced by multiple unobserved factors, which alone might make a small contribution but in combination might influence treatment decisions in a systematic way. Moreover, our analysis indicated that unobserved confounding may in fact be large because we found that the propensity score–adjusted survival (HR range = 1.46–1.56) was higher than the instrumental variable estimate (HR = 1.09, 95% CI = 0.46 to 2.59) and the estimate from the randomized controlled trial (RR = 1.04, 95% CI = 0.77 to 1.40) (Table 3).
For the sake of patients and health-care providers who use study results to make life-changing decisions, researchers need to use multiple methods of risk adjustment, such as propensity score reweighting and instrumental variable analysis, to confirm that the results are not sensitive to the method of risk adjustment. If differences are noted, the clinical plausibility and statistical validity of the various approaches should be reexamined and results should be considered with appropriate caution. Patient selection into specific treatments on the basis of factors related to prognosis is an important consideration in all observational studies, but particularly in studies involving cancer in which the incidence is highest in the elderly who are also most likely to have multiple comorbidities.
As we have noted, instrumental variable analysis does not guarantee that all observational data bias is eliminated. The variable(s) selected as the instrument should have a statistically significant association ith treatment choice but not with the health outcome or with unobserved factors that influence the health outcome. Although there are guidelines for assessing whether an instrument is a strong predictor of the treatment received, there is no definitive test for an instrument's validity. If the instrument is weak, the extent of bias in the instrumental variable estimate may be greater than in the unadjusted observational data. For example, our alternative instrumental variable analysis that used both the lagged area treatment and measures of local area medical resources, which were not strongly related to the treatment received, resulted in much better all-cause survival (HR = 0.71, 95% CI = 0.31 to 1.59). Although this estimate had a large confidence interval and the hypothesis of no difference in survival could not be rejected, its low point value could lead some to conclude that conservative management was associated with better survival than radical prostatectomy in this population of men between the ages of 66 and 74 years. Although use of multiple instruments may increase the ability to explain the treatment received, it can also increase the likelihood of an association between the instrumental variable and the health outcome. It is also important to recognize that the result of the instrumental variable analysis is limited primarily to the population on the treatment margin (ie, men who do not have strong indications that would favor one treatment approach over the other). Given these caveats, however, if a conceptually plausible instrument that has a strong and statistically significant association with the treatment can be found, instrumental variable analysis should provide a potentially important alternative and complementary methodology to propensity score methods for assessing treatment outcomes without having to make any a priori assumptions about the potential magnitude of unobserved confounders.
There were several limitations in our study. First, the benchmark Scandinavian randomized controlled trial (28,29) that we used to assess the alternative statistical methods of adjusting for observational data bias is only one study and is not representative of all elderly prostate patients in the United States. It was also limited to the comparison of radical prostatectomy and conservative management, excluding men who were treated by radiation therapy. Although we selected our study sample of patients from the SEER database to be as similar as possible to patients in the benchmark clinical trial, we were not able to include prostate cancer patients who were younger than 66 years, a group that represented approximately 46% of the trial population (28,29). The effectiveness of treatment varied in the two age groups (ie, <65 and ≥65 years), and we compared our findings with the population aged 65 years or older. Consequently, our analysis should be viewed primarily from a methodological perspective rather than as an analysis with direct implications for clinical treatment. Second, our sample was restricted to the approximately 85% of Medicare enrollees in fee-for-service plans. Prostate cancer stage at diagnosis has been reported to be similar in Medicare managed care and fee-for-service settings; however, among patients with clinically localized disease, treatment varies by setting (8). Patients with early-stage prostate cancer in managed care were less likely to receive radical prostatectomy and more likely to receive radiation or conservative management than similar patients in fee-for-service settings (8). Third, enrollment in Medicare managed care changed during the study period, which may also limit the generalizability of our findings to the managed care population. Fourth, we did not have information about prostate-specific antigen screening before diagnosis. Use of prostate-specific antigen screening increased dramatically over the period of our study (52–54), and the number of men diagnosed with early-stage prostate cancer increased accordingly (55). Although we included year of diagnosis in our models, we could not identify which prostate cancer patients were diagnosed because of elevated prostate-specific antigen levels and which were diagnosed because of symptoms. Lastly, a complete statistical assessment of the Cox hazard model's proportionality assumption indicated that the effects of some covariates may not be time invariant, especially in the analysis of all-cause mortality. Although a sensitivity analysis of the effects of allowing time-varying covariates did not alter the principle findings with regard to treatment effects, further analysis of time-varying effects may be warranted.
In summary, survival after radical prostatectomy or conservative management in elderly patients with early-stage prostate cancer as calculated by instrumental variable estimation of Cox proportional hazard models from observational data was similar to that calculated by Cox proportional hazard models from clinical trial data. Consequently, instrumental variable analysis may be a useful technique in comparative effectiveness studies of prostate cancer and other cancer treatments if an acceptable instrument can be identified. Future research is warranted to evaluate additional methods of addressing confounding in observational data and to compare results from these methods with those from benchmark randomized controlled trials.
National Cancer Institute (HHSN2612007003 39P).
|Radical prostatectomy†||Exclusion||ICD-9 procedure codes 60.2-60.6||Surgery codes 30, 50, 80|
|CPT-4 codes 55840, 55845, 55810, 55815|
|Conservative management‡||Radiation||ICD-9 diagnosis codes V58.0, V66.1, V67.1||Radiation codes 1-5, 7|
|ICD-9 procedure codes 92.21-92.29|
|CPT-4 codes 77261-77299, 77300, 77305, 77401-77499, 77750-77799|
|Revenue center codes 0330 or 0333|
|Chemotherapy||ICD-9 diagnosis code V58.1||Chemotherapy codes 01, 02, 03|
|ICD-9 procedure code 99.25|
|CPT-4 96400-96549, 95990, 95991, 96530, G0355, G0357-G0359, J0640, J2405, J8520-J8999, J9000-J9164, J9166-J9201, J9203-J9216, J9220-J9999, K0415, K0416, Q0083-Q0085, Q0179, S0177, S0181;|
|Revenue center codes 0331, 0332, or 0335|
|Hormonal therapy||J1950, J9217-J9219, J9202, J9165||Hormone therapy 01|
|Prostate cancer–directed surgery||ICD-9 procedure codes 60.2–60.6||Surgery codes 30, 50, 80|
|CPT-4 codes 55840, 55845, 55810, 55815|
Authors had full responsibility in the design of the study; the collection, the analysis, and interpretation of the data; the decision to submit the article for publication; and the writing of the article.