|Home | About | Journals | Submit | Contact Us | Français|
Arti Hurria, M.D.d, ahurria/at/coh.org, Phone: 626-256-4673 ext. 64173, Fax: 626-301-8898
Ilene H. Zuckerman, PharmD, Ph.D.b, izuckerm/at/rx.umaryland.edu, Phone: 410-706-3266, Fax: 410-706-5394
Stuart M. Lichtman, M.D.e, lichtmas/at/mskcc.org, Phone: 631-623-4100, Fax: 631-864-3827
Naimish Pandya, M.D.c, npandya/at/umm.edu, Phone: 410-328-2567, Fax: 410-328-6896
Arif Hussain, M.D.c, ahussain/at/som.umaryland.edu, Phone: 410-328-7225, Fax: 410-328-0805
Franklin Hendrick, B.S.b, fhend001/at/umaryland.edu, Phone: 301-919-0417, Fax: 410-706-5394
Jonathan P.Weiner, Dr. P.H.f, jweiner/at/jhsph.edu, Phone: 410-955-5661, Fax: 410-955-0470
Xuehua Ke, M.A.b, xke001/at/umaryland.edu, Phone: 410-706-1418, Fax: 410-706-5394
Martin J. Edelman, M.D.c, medelman/at/umm.edu, Phone: 410-328-2703, Fax: 410-328-0805
To develop and provide initial validation for a multivariate, claims-based prediction model for disability status (DS), a proxy measure of performance status (PS), among older adults. The model was designed to augment information on health status at the point of cancer diagnosis in studies using insurance claims to examine cancer treatment and outcomes.
We used data from the 2001–2005 Medicare Current Beneficiary Survey (MCBS), with observations randomly split into estimation and validation subsamples. We developed an algorithm linking self-reported functional status measures to a DS scale, a proxy for the Eastern Cooperative Oncology Group (ECOG) PS scale. The DS measure was dichotomized to focus on good [ECOG 0–2] versus poor [ECOG 3–4] PS. We identified potential claims-based predictors, and estimated multivariate logistic regression models, with poor DS as the dependent measure, using a stepwise approach to select the optimal model. Construct validity was tested by determining whether the predicted DS measure generated by the model was a significant predictor of survival within a validation sample from the MCBS.
One-tenth of beneficiaries met the definition for poor DS. The base model yielded high sensitivity (0.79) and specificity (0.92); positive predictive value=48.3% and negative predictive value=97.8%, c-statistic=0.92 and good model calibration. Adjusted poor claims-based DS was associated with an increased hazard of death (HR=3.53, 95% CI 3.18, 3.92). The ability to assess DS should improve covariate control and reduce indication bias in observational studies of cancer treatment and outcomes based on insurance claims.
Observational studies using administrative data are increasingly used to provide information on population-based patterns of cancer treatment and to evaluate treatment outcomes through comparative effectiveness research.1, 2 For older adults and non-elderly adults with disabilities, the Surveillance, Epidemiology and End Results (SEER) registry, linked to Medicare enrollment and claims data, has become an important resource for this research, as have other claims based sources.3, 4 One challenge for researchers using administrative data is that treatment decisions integrate a variety of factors, including patient health status and patient or physician preferences and attitudes, not all of which can be measured fully in the available data. Most claims-based measures of health status rely on diagnosis codes to control for the presence of comorbidities at the time of cancer diagnosis.5 For example, diagnostic information has been used to create weighted indices, as in the case of the Charlson Comorbidity Index (CCI).6 However, dimensions of health status such as functional or performance status (PS), are difficult to characterize using the available methods.7, 8 To the extent that these poorly measured or unobserved factors are important determinants of treatment and survival, failure to take them into account can result in biased estimates.9
PS is a measure of patient functional capacity, with an emphasis on physical dimensions. PS incorporates the ability to work, time out of bed, and the ability to perform “self care.” An initial scale developed by Karnofsky was subsequently modified by the Eastern Cooperative Oncology Group (ECOG), and eventually adopted by the World Health Organization.10, 11 The scale is summarized in Appendix A. PS scores are commonly used throughout oncology practice as a general numerical guide to the cancer patient’s health. PS is assessed based on patient or proxy report of activity levels, combined with clinician observation of the patient’s mobility during a medical encounter. As a result of its prognostic value for survival, PS is used as a criterion for selection into clinical trials and as a key factor determining whether to actively treat cancer patients or provide supportive care. Treatment guidelines, for example, those promulgated by expert panels through the National Comprehensive Cancer Network, routinely tailor recommendations by patient PS.12 PS is usually assessed at diagnosis to determine an initial treatment strategy, and may be updated regularly as treatment response and disease progression require reassessment of treatment.13 Furthermore, most cancer clinical trials are restricted to patients with ECOG PS score 0–2, with many further restricted to patients with PS score 0–1 because patients with poor PS (>=3) are more likely to experience unacceptable toxicities and/or are less likely to experience survival benefit.14
Given the clinical relevance of PS in cancer patients, and the current limited ability to assess PS with existing claims-based comorbidity measures, we undertook the development and validation of a multivariate prediction model based on administrative claims to capture this dimension of health status. The resulting model can be used to augment health status information in research using administrative databases that lack PS or functional status information, but include administrative claims needed to operationalize the independent measures in the model. As we did not have direct measures of PS in the data used to develop the model, we first created a proxy measure which we refer to as disability status (DS) based on combinations of self-reported functional status measures.15 We describe the process of DS construction, model development, and initial validation steps for the DS model.
We used data from the Medicare Current Beneficiary Survey (MCBS), a nationally representative rotating panel survey of community-based and institutionalized Medicare beneficiaries.16 The MCBS samples approximately 5,100 new beneficiaries each year, with up to four years of observation. The survey captures information on demographics, insurance, and self reported health and functional status, including limitations and dependence in activities of daily living (ADLs) and instrumental activities of daily living (IADLs), difficulty with activities requiring strength, stamina or agility, and participation in exercise. The MCBS is linked by the Centers for Medicare and Medicaid Services (CMS) to Medicare Parts A and B claims for participants in each survey year, which provide detailed information concerning service date, provider, type of service, and associated diagnoses. Specific procedures are reported using either ICD-9-CM procedure codes, the American Medical Association’s Current Procedural Terminology (CPT) codes, or the CMS Healthcare Common Procedure Coding System (HCPCS level II) codes. We used data from 2001, 2003, and 2005, as these years included biannual questions on exercise. We selected one observation-year per beneficiary. Beneficiaries with any Medicare Advantage enrollment or who resided in facilities other than assisted living or nursing homes (for example, group homes for persons with mental disabilities) were also excluded, the latter due to limited functional status information.
The initial sample included both older (age >= 65 years) and non-elderly adults with disabilities enrolled in Medicare. In sensitivity analyses we determined that model fit was substantially improved when we limited the sample to older adults. The final sample of Medicare beneficiaries was randomly split into estimation and validation samples, with 7,394 enrollees in each.
The MCBS is a sample of the general Medicare beneficiary population, not limited to patients with cancer. As a result, our model captures relationships between healthcare utilization and functional status, independent of the effects of cancer and its treatment. This is consistent with the approach used in the ECOG PS scale, and appropriate to capture baseline health status at the point of cancer diagnosis.
We developed a summary measure of DS to proxy for PS, based on self-reported measures of functional status, strength, stamina, and exercise, linked to the various functional dimensions and degrees of limitation specified in the ECOG PS scale. The approach, described in detail in Appendix B, was guided by a clinician panel representing a cross section of medical oncologists, including those with expertise in geriatrics, and thoracic, gastrointestinal, breast, and prostate cancers. For each of the individual functional status measures we specified a common 5 point limitation scale (A=‘no difficulty’, B=‘difficulty, no assistance needed/little difficulty/difficulty without help’, C=‘difficulty, personal assistance available (on standby)/some difficulty’, D=‘difficulty requiring equipment or assistance/a lot of difficulty’, and E=‘can’t do’). In our DS measure, we approximated the stepwise progression in the ECOG PS scale by grouping the functional status measures into those associated with heavy work, light work, and self-care activities, the dimensions used in the ECOG scale, and set thresholds for the number and degree of limitation associated with each level of disability. For example, the ECOG definition identifies patients with PS=3 as “capable of only limited self-care, confined to bed or chair more than 50% of waking hours,” and patients with PS=4 as “completely disabled, unable to carry out any self-care, and totally confined to bed or chair.” The DS algorithm required individuals with DS=3 to have limitations of at least level D in heavy work, in 3 of 5 measures of light work, and in dressing. Two out of three other measures of self-care were required to be at level C or greater. Requirements for DS=4 were substantially more restrictive. We created an algorithm that summarized information across the individual measures in each dimension, and then linked to the levels on the DS scale, assigning a survey-based DS level to each individual. We found that relatively few patients met the highest levels of activity limitations, so shifted thresholds to achieve greater variation in DS assignment. For the models reported in this paper, we collapsed the 5 level DS measure into a dichotomous indicator for good DS (DS=0–2) versus poor DS (DS= 3–4). The survey-based DS measures were used as the dependent variable for developing a prediction model using information from administrative claims as predictors.
We identified a broad group of potential explanatory variables that included indicators for health care services, which were expected to vary based on DS level. These indicators were culled from three sources: services identified by clinicians in response to a series of patient vignettes (“What type of healthcare services would a patient with this degree of functional limitation be expected to use more or less commonly?”); variants of the Berenson-Eggers Type of Service (BETOS) codes; and selected demographic measures, which we tested for model stratification or interaction. The BETOS groups are comprised of CPT and HCPCS codes, and are intended to represent readily understood clinical categories, such as ‘hip or knee replacement’, which are stable over time.17 The BETOS indicators were pulled from physician and hospice claims during the calendar year in which the respondent’s survey took place. We lumped or split selected BETOS categories to better capture the potential clinical relationship between the service and DS. Healthcare services specifically provided to patients with physical limitations, such as home oxygen, mobility aids (walker, wheelchair), or nursing home admission were expected to be associated with poor DS, while use of preventive services and elective surgical procedures were expected to be associated with better DS. Indicators were grouped into preventive services, evaluation and management (E&M) visits, other visit types, minor or ambulatory procedures, major procedures, imaging, durable medical equipment use, and other. We did not include indicators for chronic conditions or age, as DS is intended to capture a health status dimension independent of these factors.
We analyzed the distributions of the claims-based indicators overall and by survey-based DS (good/poor). Stepwise logistic regression predicting poor DS was used to select explanatory variables, using a 95% significance level for both variable entry and exit. We selected the optimal model as the one with the lowest Akaike information criterion (AIC), suggesting greatest model efficiency.18–21 Analyses were performed using SAS version 9.2 (SAS Institute, Inc. Cary, N.C.).
We estimated models with main effects only, and then permitted interaction terms to enter the stepwise regression modeling process, forcing interaction effects to remain in the model only if accompanied by the main effects. We permitted interactions with the health care service indicators and both region of the country and enrollment in Medicaid or Medicare Savings Programs ((MSP), where state programs pay Medicare Part B premiums), to capture the effects of differences in practice patterns that might be affected by region or supplemental insurance. Although indicators for all supplemental insurance types were available in the MCBS, only information on Medicaid/MSP participation is available in SEER-Medicare. In sensitivity analyses we used a three level measure of survey-based DS (0–1, 2, and 3–4) and estimated ordered logit models.
Model fit was tested by using the regression results to generate a predicted (claims-based) DS measure, converting the continuous prediction to a discrete indicator for (good/poor) and examining concordance-discordance between the predicted and survey-based DS.22 We examined the sensitivity and specificity associated with different cutpoints in the estimation sample, and selected the highest possible cutpoint that maintained a sensitivity of 80%. We chose this cutpoint as a reasonable approach to balance the trade-off between the false positive and negative rates of estimated poor DS. Model calibration was assessed with the Hosmer-Lemeshow goodness-of-fit test.23 Model fit was tested both within the estimation and validation samples, and among the subgroup with poor self reported health status. Although the MCBS is designed to be representative of the Medicare population, there is oversampling of the oldest-old and younger beneficiaries with disabilities. We present unweighted analyses to be consistent with application in datasets where weights are not commonly employed; in sensitivity analyses we re-estimated the prediction models using MCBS sampling weights.
To further validate our DS measure we examined the relationship of both the survey-based and predicted claims-based DS measure with overall survival within the validation set of the MCBS. We calculated a follow-up period from October 1st (approximate timeframe for the Fall survey) of the relevant survey year until death. MCBS respondents, who did not die, were censored at December 31st, 2007. We estimated Cox proportional hazards (PH) models with either the survey-based or predicted DS, controlling for race, age, gender, educational attainment, marital status, household income, and residence location. We also included the CCI, to test whether the predicted DS measure had an independent association with survival. The project was approved by the University of Maryland Institutional Review Board.
The characteristics of the estimation and prediction samples are provided in Table 1. Just under one tenth (9.3%) of beneficiaries met the definition for poor survey-based DS. Table 2 reports the prevalence by survey-based DS category of the claim indicator variables that appeared in either of the final models. The most common category of procedures was immunizations/vaccinations, reported for 45.8% of beneficiaries overall, with rates of 48.3% in the good DS group, and 21.5% in the poor DS group. Wheelchair claims were present for 4.9% overall, with rates of 3.4% of those with good DS and 19.5% among those with poor DS.
The estimated coefficients from the logistic regression models, with and without interactions between the service indicators and either Medicaid/MSP enrollment or region of the country, are provided in Appendix C. We do find that many relationships are consistent with clinical intuition, for example, that nursing home stays, home care, and ambulance use are associated with an increased probability of poor DS, while vaccinations, screenings, and cardiac monitoring and stress tests are associated with lower probability of poor DS. For the model without interactions the c-statistic was .92, and the Hosmer-Lemeshow statistic indicated good model calibration. We identified cutpoints of 0.115 and 0.110 for models with and without interactions, respectively to generate the dichotomous indicator of good/poor predicted DS. Table 3 summarizes the within-sample concordance and out-of-sample predictive ability. Generally, the two models performed similarly. In the model without interactions within the validation half of the sample, the positive predictive value was 48.3% and the negative predictive value was 97.8%. Predictive value is a function of the underlying distribution of the dependent variable, survey-based DS. When we subset to those with self reported poor health status, the positive predictive value of predicted poor DS was improved (69.3%), while the negative predictive value was somewhat lower (84.2%). It is important to note that although the choice of probability cutpoints was chosen to maximize sensitivity and specificity, different situations might require focus on other aspects of model performance such as positive predictive value, and the cutpoints can be adjusted accordingly.
In sensitivity analyses using a three-category dependent variable (DS 0–1/2/ 3–4), we found that our model was unable to distinguish the middle group (DS=2) from the DS 0-1 group, resulting in poor specificity (results not shown). Weighted sensitivity analyses resulted in only negligible changes to the models and predictions (results not shown).
Table 4 provides the results of the Cox PH models examining the effect of the survey-based and predicted DS measures on overall survival. Median survival time for individuals with poor survey-based DS was 23 (95% CI 21, 27) months. This compared to 33 (95% CI 30, 36) months for individuals with poor claims-based DS. Median survival time was not reached in either the good survey-based DS or good claims-based DS individuals. In the adjusted models 2 and 3 we find the hazard ratio of death for individuals with poor survey-based DS (HR=3.97 (95% CI 3.55, 4.44)) is close to the hazard ratio of poor claims-based DS (HR=3.53 (95% CI 3.18, 3.92)). When the CCI is included in model 4 we find the claims-based DS point estimate is smaller but remains significant HR=2.85 [95% CI 2.56, 3.17].
Our results demonstrate that healthcare service use indicators from administrative claims can be used to predict DS, and that the resulting predicted value is associated with survival in an older adult Medicare population. The explanatory variables were identified through a combination of clinician judgment as well as observed patterns within the data. As a result, the set of candidate variables were both clinically relevant and prevalent in the data. The explanatory variables in the final models reflect those associated with both good and poor DS. In a recent study examining the association between physician reported PS and claims-based service use measures in lung cancer patients (Salloum et al. 2011) and in selected studies examining determinants of cancer treatment and outcomes, indicators expected to be associated with poor PS were found to be associated with lower likelihood of aggressive cancer therapy.24–28 Inclusion of selected services that predict good DS is unique in this study. These latter measures were found to be significant in both bivariate comparisons and the stepwise regression modeling process.
Overall, we identified approximately 9% of older adult Medicare beneficiaries as having poor survey-based DS. There is little information available concerning PS of a broad population of cancer patients at diagnosis, since PS is not collected routinely unless patients are entered into clinical trials.29 One study that examined ECOG PS among a consecutive series of 363 cancer patients reported that 67% of males and 80% of females present with good ECOG PS (PS of 0–1), with rates of good ECOG PS declining with increasing age.30 We identified good survey-based DS in 90% of our overall sample, with 93% for males and 89% for females. Hence, our rates of good DS are somewhat higher, likely due to our inclusion of DS=2 in the “good” category, and that we are examining a general sample of older adults.
Our prediction models performed well in both the estimation and validation samples. However, the models did not perform as well among the Medicare enrolled non-elderly with disabilities. In exploratory analyses we found different distributions of claims indicators, especially lower use of preventive services among the adults with disabilities when compared to older adults reporting similar levels of functional status impairments. Conceivably, adults with long term disabilities may have different perceptions of their physical limitations, which may alter the relationship between reported functional status and healthcare service use.
The strong negative effect of both poor survey-based and predicted DS on overall survival provides initial validation that both measures capture the construct of PS. The slightly smaller magnitude of the claims-based predicted measure is to be expected – as there will be some error in the model-based prediction relative to the survey-based measure. Finally the retention of strong statistical significance after inclusion of the CCI in the survival models provides initial support for the independent significance of the DS measure, and the likely value in improving covariate control in cancer specific analyses of treatment and outcomes.
There are several limitations to our study that should be noted. Our dataset does not include a direct measure of ECOG PS. Instead, we constructed DS which serves as a proxy measure. Ideally we would have worked with a dataset that included both ECOG PS and the insurance claims. However, ECOG PS is routinely documented primarily in clinical trial records, which are usually specific to a single type of cancer, generally limited to patients with good ECOG PS, and not generally linked to claims. Hence, the data would not support the type of modeling question we address. The approach we used, relying on expert opinion and an iterative process to develop our DS proxy measure, should have resulted in a reasonable approximation for ECOG PS.
Our report of a predictive model to capture DS adds a new dimension to the use of diagnostic or procedure codes from insurance claims to capture and control for the effects of health status on treatment and outcomes. We have identified several areas for further validation of our model, and to explore how it may be used in research to examine comparative effectiveness, disparities, and quality. To further validate our model, we have implemented the DS prediction in previously developed SEER-Medicare analytic files and will test the ability to explain observed treatments and outcomes. We envision that this prediction model could also become a tool used in quality measurement. For example, it might be a mechanism to identify population level patterns of aggressive treatment for patients with poor DS, when there is a less favorable risk:benefit ratio. More general predictions of functional status may be useful in all of these roles, when applied to patients with chronic conditions other than cancer.
Cancer is the second leading cause of mortality in the U.S.31 The burden of cancer is particularly heavy for the elderly population, where cancer incidence rates are almost 10 times that of the under-65 population.32 Continued research to evaluate the benefits of cancer treatment in older patients is essential to public policymakers and medical insurers.33 The ability to assess DS from claims data should significantly improve the validity of results from observational studies using administrative claims data such as SEER-Medicare. This research will help us identify how the risk/benefit ratio of therapy varies among patients with different DS status. This tool will also permit improved covariate control, and reduce treatment selection bias in observational studies designed to examine the effectiveness of new therapies as they are adopted for treatment of older patients.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.