|Home | About | Journals | Submit | Contact Us | Français|
Performance status (PS) is a good prognostic factor in lung cancer and used to assess chemotherapy appropriateness. Researchers studying chemotherapy use are often hindered by the unavailability of PS in automated data sources. To our knowledge, no attempts have been made to estimate PS using claims-based measures. This study explores the ability to estimate PS using routinely available claims-based measures.
A cohort of insured patients aged 50+, diagnosed with stage II–IV lung cancer between 2000–2007 was identified via a tumor registry (n=552). PS was abstracted from medical records. Automated medical and pharmaceutical claims from the year preceding diagnosis were linked to tumor registry data. A logistic regression model was fit to estimate good vs. poor PS in a random half of the sample. C statistics, sensitivity, specificity, and R2 were used to compare the predictive ability of models that included demographic factors, comorbidity measures, and claims-based utilization variables. Model fit was evaluated in the other half of the sample.
PS was available in 80% of medical records. The multivariable regression model predicted good PS with high sensitivity (0.88/0.94 depending upon how good PS was defined), but moderate specificity (0.45/0.32) with a 0.50 prediction cutoff, and good sensitivity (0.64/0.83) and specificity (0.69/0.55) when the cutoff is 0.70. Goodness-of-fit c-statistic was 0.76/0.78.
PS can be estimated, with some accuracy, using claims-based measures. Emphasis should be placed on documenting PS in medical records and tumor registries.
Since 1997 evidence-based guidelines have recommended the use of chemotherapy treatment for medically fit patients with lung cancer to improve survival, symptoms, and quality of life.1–4 Despite these recommendations, numerous studies5–8 have illustrated variability in receipt of chemotherapy treatment among patients with lung cancer. Yet, the ability to determine the appropriateness of observed treatment variability has been greatly hindered by voids in the clinical information necessary to judge appropriateness.
One key factor in evaluating the appropriateness of chemotherapy treatment is the patient’s performance status (PS).1–4 Performance status is a subjective composite measure used by clinicians to assess current functional capacity, and the likelihood of adverse events, quality-of-life, and survival after treatment. Measures of PS are currently not available through automated medical claims, tumor registries, or other observational data commonly used to study cancer treatment and its associated outcomes. Thus, use of such data to address questions regarding chemotherapy treatment has been relatively limited and when undertaken the inability to consider PS is a noted limitation.8–10 The systematic lack of information on PS similarly impedes the ability of researchers to use existing automated, observational data for comparative effectiveness research.
This article asks two questions. First, how often are measures of a patient’s PS documented in his or her detailed medical record? Second, is it possible to accurately estimate a patient’s PS using routinely available tumor registry and claims-based measures on that patient’s demographics, comorbidities, and prior healthcare utilization? Using a cohort of lung cancer patients diagnosed between 2000 and 2007, the feasibility of using medical record documentation to obtain PS measures is described overall and by patient characteristics. We then combine medical record documented PS information with information routinely available in an automated tumor registry as well as medical and pharmaceutical claims data to evaluate the feasibility of estimating PS among lung cancer patients using information routinely available in observational data sources. To our knowledge, this has not previously been attempted among patients with lung, or other, cancers.
Study patients were those receiving care from a 900-physician member, multi-specialty, salaried medical group practice in southeast Michigan. Data available from the medical group’s tumor registry were used to identify all patients aged 50 and older who were diagnosed with lung cancer between January 1, 2000 and December 31, 2007. The medical group, which provides care under both fee-for-service and capitated arrangements, staffs 27 primary care clinics throughout Detroit and the surrounding metropolitan area. Patients eligible for study inclusion were those continuously enrolled in an affiliated health plan (i.e., health maintenance organization) for the one-year period preceding their date of lung cancer diagnosis. Patients for whom no stage was available at the time of diagnosis or for whom the stage at diagnosis was 0-I were excluded, as chemotherapy treatment was not indicated for patients staged 0 or I during this time period.11 The Medical Group’s Institutional Review Board (IRB) approved all aspects of the study protocol.
The two most commonly used PS systems are the Eastern Cooperative Oncology Group (ECOG) scale, and the Karnofsky Performance Scale (KPS).12 Although the two scales are not identical they are generally thought to capture the same conceptual domain and conversions are possible between them (Table 1).13
Two trained chart abstractors reviewed inpatient and outpatient nursing and physician notes available within the patient’s electronic medical record from two months prior to diagnosis until the first of death, disenrollment, initiation of chemotherapy, or six months following diagnosis. If available, abstractors documented specific numeric performance scores and scale (i.e., ECOG or KPS). Patients were assigned a ‘good’ PS if they had an ECOG score of 0 or 1 or a KPS score of 100-80. A “poor” PS was assigned to patients with an ECOG score of 2–5 or a KPS score of 70-0. This was done to be consistent with standards in practice regarding recommendations for chemotherapy use among lung cancer patients during the study period1–4 as well as with existing research applications.14 With the issuance of the 2009 ASCO guidelines, the standard for chemotherapy use changed to include the consideration of use for those with ECOG score of 2 or a KPS score of 60–70. Thus, we also present alternative results for which those with these scores are realigned to PS ‘good.’
If no numeric score was documented, abstractors collected medical record documentation of ‘good’ or ‘poor’ PS. If no reference to PS was documented in the medical record, notes regarding the patient’s functionality (e.g., references to shortness of breath, use of wheelchair or other personal mobility devices, labor force participation, exercising habits, activities of daily living or other references to mobility) were recorded and used to estimate PS. Interrater reliability between the 2 abstractors was assessed on a random subset of 40 observations. The resulting Cohen’s Kappa was 0.88. Among the inter-rater reliability subset (N=40), each incident where the abstracted PS did not match between the two abstractors (N=3), one abstractor indicated ‘good’ or ‘poor’ while the other selected ‘unknown’ PS. For the final analytical database, these differences were reconciled by choosing ‘good’/’poor’ over ‘unknown’.
Automated tumor registry and claims data were used to obtain patient demographic characteristics, cancer stage, and diagnoses for each patient. Demographic measures included age, gender, and race. The age of the patient (in years) was recorded as of lung cancer diagnosis date. Clinical variables examined included stage at diagnosis and the Charlson comorbidity index.15 Cancer stage was reported using the American Joint Committee on Cancer (AJCC) stages II–IV. A dichotomous variable was created to control for AJCC stage IV patients in the regression. The Deyo adaptation of the Charlson Comorbidy Index and each of its component diagnostic subgroups were constructed using inpatient and outpatient diagnostic information available in the 12-month period preceding diagnosis.16 In addition, claims data provided information on prescription drugs dispensed, and medical care utilization in the 12-month period preceding lung cancer diagnosis.
Medical care utilization measures included those reflective of inpatient stays in a short stay hospital or skilled nursing facility (SNF), ambulatory care visits, emergency department visits, and utilization of home health services, same day surgery, and durable medical equipment (DME). For each person, inpatient utilization measures included the total number of distinct inpatient stays, the total number of inpatient days, and the average of length of an inpatient stay, for those with a nonzero number of stays. The number of outpatient visits was recorded, and in the regression a dichotomous variable was created to control for patients with nonzero outpatient visits. Similar dichotomous variables were constructed to reflect any drug dispensing and any DME utilization. The emergency department, home health, and same day surgery utilization variables measured the counts of visits incurred. We also evaluated the use of a count of the distinct number of medications dispensed during the baseline year, as recommended by Schneeweiss and colleagues.17 For this measure, medications whose first eight digits of the American Hospital Formulary Services code were equal were considered to be the same drug.18
Among the cohort of lung cancer patients, we report the frequency of documented PS in medical records, and describe the different ways PS was recorded. Systematic differences between patients for whom PS was recorded and patients for whom it was not recorded were examined using two sample t-tests (or Wilcoxon rank sum tests) and chi-squared tests, depending on the nature of the characteristic. Similar analyses were conducted to compare unadjusted differences in patient characteristics by ‘good’ PS vs. ‘poor’ PS. Multivariable logistical regression models were fit to evaluate the feasibility of using routinely available observational data to predict ‘good’ vs. ‘poor’ PS. Three separate models were estimated, reflective of three different levels of comprehensiveness of observational data routinely available. The first regression model included only those variables typically available via tumor registries (demographics and stage). The second model included those same variables, plus measures of medical care utilization and diagnoses available in medical claims data. The third model added measures of prescription drug use routinely available via pharmaceutical claims.
For each model, a split sample cross-validation was used to check for model overfitting. C statistics, sensitivity, specificity, and R2 were used to assess and compare the predictive ability of the different models. Initially, all variables were considered for inclusion. However, the final model in each of the three categories was fit using the stepwise-elimination method. Pairwise interactions were tested but did not enhance model prediction. Likewise, we evaluated the need to account for the non-independence of patients seen by the same physician, but as the intra-class correlation coefficient (ICC) was negligible (ICC = 0.01), we elected not to do so as not doing so enabled us access to additional assessment of model fit. The final models were estimated on the full sample and bootstrapping was used to replicate each final model 1000 times to create 95% confidence intervals around the c and R2 statistics.19
To examine model discrimination, patients were ranked by their predicted probability of ‘good’ PS based on each model. Patients were then divided into deciles based on increasing predicted probability of ‘good’ PS and actual ‘good’ PS rates were reported among patients in all deciles to suggest how well models separated patients with ‘good’ PS from those with ‘poor’ PS.20
We used SAS software, version 9.1.3 (SAS Institute Inc, Cary, North Carolina) for all analyses, and we considered p<.05 to be statistically significant.
Five hundred and fifty two patients met the criteria for study eligibility. The mean age at diagnosis was 67.4 (standard deviation [SD], 9.1). Of the study-eligible patients, 42 percent were female, while the race distribution was 69 percent white and 31 percent black. The AJCC staging distribution was as follows: 9 percent were stage II, 20 percent stage IIIA, 19 percent stage IIIB, and 52 percent stage IV. The average Charlson comorbidity score across the eligible sample was 2.8 (SD, 3.4), while the average number of distinct prescription drugs used in the year prior to diagnosis was 9.3 (SD, 7.1).
The average number of inpatient days in the year preceding diagnosis for the cohort—including those with no inpatient stays—was 2.9 days (SD, 7.5), while the average number of inpatient stays was 0.5 (SD, 0.8), resulting in an average inpatient length of stay of 5.0 days (SD, 5.2). The average number of outpatient visits was 5.7 visits (SD, 8.5) and the average number of emergency department visits was 0.6 (SD, 1.1) for the same time period. Across the study-eligible sample, 28 percent recorded any home health utilization, 3 percent had same day surgery, 12 percent incurred a DME dispensing, and 4 percent incurred a stay in a rehabilitation or skilled nursing facility. None incurred a hospice stay in the period preceding lung cancer diagnosis.
Of the 552 study eligible patients, PS was recorded in the medical record for 261 cases (47 percent). Among these, a numeric score was documented in 248 cases (95 percent), with the ECOG scale most often used (74 percent). For the remaining 13 patients, although a numeric score was not documented, explicit documentation was found of either ‘good’ or ‘poor’ performance score.
Among the 291 (53%) patients for whom PS was not recorded, there were 181 for whom there was a sufficient verbal description of the patient’s functioning in either the physicians’ notes, nurses’ notes, or a combination of both to enable a determination of either a ‘good’ or ‘poor’ PS score. Thus, overall there were 442 patients (80%) for whom PS was determinable in their medical record.
Differences in patient characteristics by PS documentation level are reported in Table 2. The first two columns compare those patients for whom medical record documentation could be used to determine PS (known PS) with those for whom medical record documentation was insufficient to determine PS (‘unknown’ PS). As shown, patients with ‘unknown’ PS (n=110) did not differ significantly from those with a known PS (n=442) in terms of demographic or clinical characteristics or measures of medical care utilization.
Among patients with a known PS, the third and fourth columns of Table 2 compare patient characteristics between those who had a documented PS – either numeric or verbal with those for whom a PS was extrapolated based on notes in the medical record. No significant differences were observed for most measures. However, there were significant differences by gender, diagnosis of atherosclerotic cardiovascular disease, and the average number of inpatient days.
Among the 442 patients for whom PS is known, 290 patients (66%) had a ‘good’ PS using the pre-2009 definition of ‘good’ and 152 (34%) had a ‘poor’ PS. This changes to 76% ‘good’ and 24% ‘poor’ when those with a document numeric performance score of 2 are considered ‘good’ as would be the consistent with that in the 2009 ASCO guidelines for the use of chemotherapy. The unadjusted differences in patient characteristics by PS are illustrated in Table 3. Compared to patients with ‘good’ PS, patients with ‘poor’ PS were significantly older (69.7 vs. 66.4 years), more likely to be male (66% vs. 54%), have stage IV disease (64% vs. 44%), and have a significantly higher Charlson comorbidity score (3.6 vs. 2.4). Consistent with the latter finding, patients with ‘poor’ PS were significantly more likely to have been diagnosed with several of the individual components of the comorbidity index when compared to those with ‘good’ PS. Patients with ‘poor’ PS also incurred significantly more inpatient days (5.5 vs. 1.7) as well as longer lengths of stay (6.8 vs. 5.4 days) in the year prior to diagnosis, and were more likely to have incurred any outpatient visit, home health utilization, or DME use in the year prior to diagnosis. Also of note is that patients with ‘poor’ PS were significantly less likely to have undergone chemotherapy in the year following diagnosis (42% vs. 82%) (data not shown). Similar differences between the groups are found when those with PS =2 are realigned with the ‘good’ PS group, with two exceptions: statistically significant differences in gender and the prevalence of peripheral vascular disease no longer exist.
Results from the logistical regression models predicting ‘good’ vs. ‘poor’ PS defined the two ways are presented in Table 4. Results are presented for models fit on the full sample and include only significant (p<0.05) variables per the stepwise regression. In the model which included only tumor registry variables, only age at diagnosis and AJCC stage were selected. Diagnosis of chronic pulmonary disease, the number of inpatient stays, any outpatient visits, and the number of emergency department admissions were all added when information from medical claims data were considered. One more variable, the number of distinct prescription drugs, was added when information from pharmaceutical claims data was considered (model 3).
Statistical performance improved with the inclusion of additional explanatory variables (Table 4). Cross-validated c and R2 values were never more than 0.01 smaller than fitted values. Using a predictive threshold of 0.50, high sensitivity (0.88/0.94, depending upon how ‘good’ PS is defined) was obtained with the best model (Model 3) but moderate specificity (0.45/0.32). Increasing the predictive threshold to 0.70 continued to yield relatively high sensitivity (0.64/0.83) and more moderate specificity (0.69/0.55) regardless of how ‘good’ performance score is defined.
Table 5 shows the actual and predicted ‘good’ PS rates for patients within each of the 10 deciles. As measured by the Hosmer-Lemeshow chi-square statistic,20 all models have good calibration, where actual and predicted rates within each of the 10 deciles were not significantly different (p=0.69, p=0.32, p=0.13, for models 1–3 respectively) when PS = 2 is defined as ‘poor’ and likewise not significantly different (p=0.92, p=0.63, p=0.98, for models 1–3 respectively) when PS = 2 is defined as ‘good.’ Model discrimination was also improved with the inclusion of more explanatory variables.
Among a contemporary cohort of patients with stage II–IV lung cancer, we found explicit medical record documentation of PS less than half the time (47%). Review of nursing and physician notes led to PS to be determinable via medical records 80% of the time. Given the central role that PS plays in clinical decision making among lung cancer patients the lack of consistent medical record documentation is troubling. When documented, we found the distribution of performance status among the cohort—34% ‘poor’ PS (when those with PS = 2 are considered ‘poor’)— to be identical to the 34% with ‘poor’ performance status reported by Lilenbaum et al in contemporary clinical studies.14
We found that ‘poor’ PS among lung cancer patients with stage III–IV disease can be predicted reasonably well regardless of whether PS = 2 is considered ‘good’ or ‘poor.’ Furthermore, this was true regardless of the level of comprehensiveness of data used, but particularly for models that used information routinely available in medical claims data or medical and pharmaceutical claims data combined where c-statistics were all above 0.70. While the inclusion of information routinely available in medical claims data marginally improved model fit and predictive accuracy when compared to a model fit using only data available in tumor registries, the inclusion of information from pharmaceutical claims data did not substantively alter model fit, regardless of how ‘good’ PS is defined.
To our knowledge this is the first study to use observational data to estimate PS for lung, or any other, cancer patients. As such, these findings represent an important contribution to the field. These findings are important to our ability to monitor quality of care and appropriateness of chemotherapy, and to our ability to prospectively identify patients who may be appropriate (but not targeted) for clinical trial or palliative care/hospice enrollment without relying on expensive and time consuming primary data collection methods. Predictive models such as the ones presented here that rely on data routinely available within large, observational databases can also be used to augment comparative effectiveness research, including comparisons of different chemotherapy regimens as well as receipt of chemotherapy vs. non-chemotherapy treatment and thereby greatly enhance the capabilities of existing electronic databases such as that available via SEER-Medicare data.
While our findings of significant differences in chemotherapy receipt by ‘good’ vs. ‘poor’ PS add face validity to the accuracy of the PS score abstracted from the medical record, the fact that 42% of patients with medical record documented ‘poor’ PS received chemotherapy in the year following diagnosis highlights the importance of attempts such as ours to make documented PS or PS proxies more readily available to those who monitor and study cancer care quality and outcomes. At the time of this study, national clinical practice guidelines for patients with non-small cell lung cancer unequivocally recommended chemotherapy for patients with PS 0 or 1. 1,3 These guidelines suggested that chemotherapy might “possibly” be of benefit in patients with PS 2, noting that those patients had been excluded from clinical trials. This was in line with expert opinions of the time21. More recent data have shown survival and quality of life benefits for PS 2 patients, although less than with ‘good’ PS, and the most recent ASCO guidelines are more supportive of chemotherapy treatment for patients with PS 2.22 Routine lung cancer chemotherapy among patients with PS ≥ 3 continues to not be recommended by any national professional organization. Chemotherapy use in patients with little chance of benefit and more chance of toxicity may delay discussion about prognosis and dying,23,26 which may lead to further poor quality of care such as the inappropriate use of mechanical ventilation or delays in referral to hospice, worse surviving caregiver quality of life, and high end of life care costs.24 Without PS proxies, little can be done to use automated data sources to monitor and measure either the under- or over-use of chemotherapy receipt and its implications on patient and economic outcomes.
Our results should be interpreted in light of the following limitations. First, subjectivity is present in the assignment of PS. Even when assessed by a healthcare professional, PS scales are subjective in nature25 and when estimated by physicians known to be prone to error26, usually over-estimated.14 Thus, even if our model were 100% accurate, caution would have to be used in interpreting results dependent upon an accurate classification of PS. Nonetheless, the ability to develop a useful proxy measure of PS from existing observational data will help in the use of existing national data resources such as that available with SEER-Medicare data for comparative effectiveness research. Second, our models were developed on a relatively small sample and one that is specific to one delivery system. Thus not only should care be taken when generalizing findings, but our parsimonious models may exclude important predictors of PS available in observational data. Finally, identifying patients with ‘poor’ PS by their diagnoses and use of care via claims data poses its own limitations. For instance, DME use varies significantly based on differing personal preferences and practices in addition to restrictions on reimbursement by public and private insurers. Although claims for DME offer useful information, they identify only selected people with potentially disabling conditions.27 The same is true of medical diagnoses—many of which are known to be under-captured in medical claims data—and prescription drug dispensing which reflect only those medications prescribed by physicians which the patient elected to fill. Yet, the ability to proxy PS is critical to the ability to use observational data to accurately draw conclusions about comparative effectiveness and cancer care quality at a population level if not at the bedside.
Despite these limitations, results from this study shed new light on the capacity of information routinely available in observational data to identify lung cancer patients with ‘good’ vs. ‘poor’ PS. This is especially useful for researchers interested in leveraging existing observational databases for comparative effectiveness research. Recent studies have highlighted likely overuse of chemotherapy in lung cancer treatment as well as aggressive treatment near end of life.28–30 Using a predictive model such as the one developed here with a threshold of 0.70 to proxy a patient as having ‘poor’ PS would ensure reasonably high specificity (0.69 if PS = 2 considered ‘poor’) and thereby enable identification of a population for whom the receipt of chemotherapy appears inadvisable or requiring a more tailored discussion of less benefit and more risk per current guideline recommendations, and for whom early hospice intervention maybe warranted. On the other hand, using a lower predictive threshold (0.50) and thereby increasing the sensitivity of the predictive model may be useful to health disparities researchers, where interest might be in testing a hypothesis centered on under-treatment among minority populations. Similarly, choosing a predictive threshold with a high sensitivity could facilitate population identification for observational comparative effectiveness research. The best selection of both a predictive threshold and the allocation of PS = 2 patients will ultimately depend on the user’s objectives.
Performance status has long been considered one of the strongest prognostic factors31 and is used today by clinicians to assess the appropriateness of chemotherapy treatment and regimen choice for lung cancer patients.22 With the aging population, the number of Americans with functional limitations will increase dramatically, and therefore the urgency to capture and classify functional status information will grow.32 Furthermore, given the current challenges faced by the US health care system to deliver better and more cost-effective outcomes, the importance of comparative effectiveness studies is likely to only grow. Results from our study for the first time provide health services researchers and others with a viable tool to predict PS among lung cancer patients using information routinely available in observational data. As such, the value of observational data for comparative effectiveness research and for use by those interested in understanding cancer care quality or targeting specific lung cancer patients for possible inclusion in clinical trials, hospice care or other interventions is greatly enhanced.
This work was supported in part by grants from the Fund for Henry Ford and the National Cancer Institute. We thank Elizabeth Dobie and Nonna Akkerman for their assistance with data acquisition.
This research was supported by the Fund for Henry Ford and the National Cancer Institute, National Institutes of Health, U.S. under grant (NIH R01 CA114204-03).