|Home | About | Journals | Submit | Contact Us | Français|
To develop and validate a single numeric comorbidity score for predicting short-and long-term mortality, by combining conditions in the Charlson and Elixhauser measures.
In a cohort of 120,679 Pennsylvania Medicare enrollees with drug coverage through a pharmacy assistance program, we developed a single numeric comorbidity score for predicting 1-year mortality, by combining the conditions in the Charlson and Elixhauser measures. We externally validated the combined score in a cohort of New Jersey Medicare enrollees, by comparing its performance to that of both component scores in predicting 1-year mortality, as well as 180-, 90-, and 30-day mortality.
C-statistics from logistic regression models including the combined score were higher than corresponding c-statistics from models including either the Romano implementation of the Charlson Index or the single numeric version of the Elixhauser system; c-statistics were 0.860 (95% confidence interval [CI]: 0.854, 0.866), 0.839 (95% CI: 0.836, 0.849), and 0.836 (95% CI: 0.834, 0.847), respectively, for the 30-day mortality outcome. The combined comorbidity score also yielded positive values for two recently proposed measures of reclassification.
In similar populations and data settings, the combined score may offer improvements in comorbidity summarization over existing scores.
By summarizing various medical conditions into single numeric indices, comorbidity scores can provide a standardized summary of the burden of comorbidity in a study group, increase analytic efficiency [1,2], and allow for adjustment of more potentially confounding baseline conditions than otherwise possible . Although more complete confounding adjustment may be achieved with other variable reduction methods, such as exposure propensity score and disease risk score methods [4–7], predefined comorbidity scores may be particularly useful in settings which preclude use of the high-dimensional approaches, such as when the number of potential confounders is large relative to both the number of exposures and outcomes . Indeed, use of comorbidity scores appears to be increasing, as suggested by the exponential increase in the number of articles that have cited the seminal comorbidity score papers since their publication (Figure 1).
The Charlson Index , and its implementations for claims databases [10–13], and the Elixhauser comorbidity classification system , are the most commonly used comorbidity measures [1,2]. The Charlson Index was developed as a prognostic index to predict 1-year mortality among patients admitted to the medical service of an acute care hospital and assigns empirically derived weights to 19 investigator-defined clinically important conditions . Among the various implementations of the Charlson Index for administrative data, the Romano approach, which defines each of the comorbidities by International Classification of Diseases (ICD)-9 diagnosis codes with slight modifications to some conditions (e.g. leukemia and lymphoma get grouped with any tumor), consistently performs best in predicting mortality in older populations [2,15,16].
The Elixhauser system was intended to predict hospital charges, length of stay, and in-hospital mortality and was developed by identifying comorbidities relevant to hospitalization other than the primary reason for hospitalization and the severity of that condition . As such, the Elixhauser system explicitly excludes important causes of substantial comorbidity; chiefly some of the most common causes of hospitalization and burden of comorbidity in elderly patients, including myocardial infarction and stroke. Nevertheless, using a new implementation of a single weighted numeric summary of the Elixhauser system, van Walraven et al  showed that it out-performed the Romano/Charlson measure with Medicare weights derived by Schneeweiss et al  in discriminating in-hospital death.
A natural next step in the improvement of comorbidity scores is to combine the conditions included in the Charlson Index and the Elixhauser classification system, thereby taking advantage of the degree of comorbidity quantified by each measure in a single comprehensive measure. The objectives of this study were to combine the Romano implementation of the Charlson Index (“Romano/Charlson”) with van Walraven’s adaptation of the Elixhauser system (“van Walraven/Elixhauser”) into a single numeric score and to empirically compare its performance in predicting short- (i.e. 30-, 90, and 180-day) and long-term (i.e. 1-year) mortality to each of the separate component measures. SAS code for the combined score can be downloaded at www.drugepi.org/downloads.
Similar to the approach described by Schneeweiss et al , this study used two cohorts – a development cohort from Pennsylvania and a validation cohort from New Jersey. We defined the development cohort from Pennsylvania as Medicare enrollees aged 65 years or older who had complete drug coverage through the Pharmacy Assistance Contract for the Elderly (PACE). Similarly, we defined the validation cohort from New Jersey as Medicare enrollees aged 65 years or older who had complete drug coverage through the Pharmacy Assistance for the Aged and Disabled (PAAD) program. Both PACE and PAAD provide medications at minimal expense to elderly individuals with low income but who do not meet the Medicaid annual income threshold.
We established the baseline year as starting on January 1, 2004 and ending on December 31, 2004 and the follow-up year as starting on January 1, 2005 and ending on December 31, 2005. For both cohorts, we included all individuals who had at least one pharmacy claim during the four months prior to the baseline year and who survived the baseline year. A total of NPA=120,679 individuals were eligible for the development cohort and a total of NNJ=123,855 individuals were eligible for the validation cohort.
For descriptive purposes, we computed several other simple measures of comorbidity using data from the baseline year. These included binary indicators for hospitalization in the baseline year, use of any prescription drug, receipt of any diagnosis, any physician visit, and whether or not patients spent time in a nursing home in the baseline year. We also measured the number of hospital days, the number of distinct prescription drugs used, the number of diagnoses, and the number of physician visits in the baseline year for each cohort.
For each patient in the development cohort, we determined the presence or absence in the baseline year of each of the 17 conditions included in Romano’s adaptation of the Charlson Index for use with claims data and each of the 30 conditions included in the Elixhauser system. Data from both hospital discharges and ambulatory physician services were used to identify the conditions according to ICD-9 codes. Some conditions (e.g. metastatic cancer) were included and defined the same way in both comorbidity measures. When similar, but not identical, conditions were included in both scores, we chose the more inclusive definition for consideration in the combined score.
We constructed a multivariable logistic regression model by including each of the 37 unique conditions plus age and sex as independent variables. The dependent outcome variable was death during the follow-up year. The weighting rule developed by Schneeweiss et al  was applied to the coefficients of the logistic regression model to obtain weights for each dichotomous condition. Specifically, we divided the estimated logistic regression coefficient by 0.30 and rounded the result to the nearest integer. Thus, a weight of 1 refers to an exp(0.30) = 35% increase in odds of dying during the follow-up year, with weights increased (or decreased) by 1 point for each 0.3 increase (decrease) in the ln(odds ratio). By using this approach, variable selection is independent of the size of the development cohort and the weights assigned to conditions do not depend on the magnitude of association between other conditions and the outcome.
For each of the possible 37 comorbid conditions that a given patient had during the baseline year, s/he was assigned a weight according to the procedure described above. An individual’s combined comorbidity score was then calculated by summing his or her weights.
We implemented Romano’s adaptation of the Charlson Index for use with claims data, van Walraven’s single numeric modification of the Elixhauser system, and the combined score in the validation cohort. To determine the ability of each measure to discriminate between those that died and did not die during each follow-up period, we constructed separate logistic regression models for each measure and for each outcome (i.e. 30-day, 90-day, 180-day, and 1-year mortality). Each model included as independent variables the score to be evaluated plus age and sex and included death during the follow-up period of interest as the dependent variable. From each model, we computed the c-statistic and its 95% confidence interval as a measure of discrimination  and compared these values across the 3 comorbidity scores.
We followed the methods described by van Walraven et al to assess the calibration of the scores in the validation cohort by comparing the observed and expected proportions of deaths for each value of each of the 3 scores that contained at least 1% of study patients . Levels of scores containing less than 1% were aggregated with adjacent scores. We used exact methods to compute the 95% confidence interval around the observed proportion of death for each score value. Observed and expected proportions were deemed similar if the expected proportion was contained within this 95% confidence interval.
We also used recently proposed reclassification measures [20–22] to compare the predictive performance of the combined comorbidity score to each of its constituent scores in the validation cohort. We created a reclassification table as described by Cook and colleagues [20,23] by stratifying individuals according to their risk of 1-year mortality as predicted by the model including the Romano/Charlson score and also by the model including the combined comorbidity score. Models were also adjusted for age and sex. We defined low-, intermediate-, and high-risk strata based on predicted probabilities of each mortality outcome among those who died and did not die during the follow-up interval. We did the same to create a table to compare the van Walraven/Elixhauser score and the combined comorbidity score and then again for both the Romano/Charlson score and the van Walraven/Elixhauser score for the 30-, 90-, and 180-day mortality outcomes.
From the tables, we computed 3 reclassification measures. First, we computed the overall percentage of individuals reclassified into new risk strata by the model including the combined comorbidity score versus the model including the Romano/Charlson score . We then computed the percent reclassified by the combined comorbidity score from the van Walraven/Elixhauser score. We then calculated the net reclassification improvement (NRI)  as NRI = [Pr(up|D = 1) − Pr(down|D = 1)] + [Pr(down|D = 0) − Pr(up|D = 0)], where D = 1 if a patient died during the follow-up period and D = 0 otherwise and “up” and “down” indicate whether an individual was reclassified into a higher or lower risk stratum, respectively, by the combined comorbidity score. The NRI can be interpreted as the sum of improvements in classification for those who experienced the outcome and those that did not, with positive numbers suggesting that the combined score classifies patients into correct risk strata more often than does the constituent score. Next, we calculated the integrated discrimination improvement (IDI), which is the mean difference in predicted probabilities between those who died and those who did not die during the follow-up year . Positive numbers indicate that the combined score performs better in discriminating mortality during follow-up than does the score to which it is compared. We used the asymptotic tests derived by Pencina et al to test the null hypotheses that NRI = 0 and IDI = 0 .
The composition of the two cohorts was similar in terms of demographic characteristics and most baseline measures of healthcare utilization (Table 1). Members of the development cohort had more diagnoses, on average, as compared to members of the validation cohort (mean [SD]: 20.6 [13.1] versus 15.2 [10.7] average diagnoses in baseline year) and slightly fewer physician visits (median [IQR]: 7.0 [7.0] versus 9.0 [9.0] physician visits in baseline year). A total of 10,769 deaths occurred during the follow-up year in Pennsylvania (8.9%) and 9,230 deaths occurred in New Jersey (7.5%).
The prevalence in the development cohort of each condition considered in the combined score is displayed in Table 2, along with the results of the logistic regression analysis and the corresponding new weights. In general, the relative importance of conditions based on their weights in the constituent scores was preserved in the combined score. For example, a diagnosis of metastatic cancer holds the highest weight in all 3 scores. The odds of 1-year mortality was more than 5-times greater among those with a diagnosis of metastatic cancer as compared to those without it (odds ratio [OR]: 5.17; 95% confidence interval [CI]: 4.66, 5.73) in our development cohort after accounting for age, sex, and all of the other comorbidities in the model.
Several conditions that are included in one but not the other constituent comorbidity measure were found to be relatively important and received relatively high weights in the combined score. For example, odds of 1-year mortality for those with a diagnosis of dementia were 80% higher than those for patients without a diagnosis of dementia (OR: 1.80; 95% CI: 1.69, 1.91). Thus, dementia received a weight of 2 in the combined score whereas it is not included among the Elixhauser conditions. On the other hand, the Elixhauser system assigns a high value to weight loss whereas the Romano/Charlson score does not include it. In our development cohort, a diagnosis of weight loss was found to be strongly predictive of 1-year mortality (OR: 1.81; 95% CI: 1.62, 2.03) and received a weight of 2 in the combined score. The final combined comorbidity score, including only conditions with non-zero weights, is presented in Table 3.
The distributions of the 3 scores in the validation cohort are depicted in Figure 2. Table 4 summarizes results for both the development and validation cohorts. The c-statistic for the combined score was 0.860 (95% CI: 0.854, 0.866) for predicting 30-day mortality in the validation cohort compared to 0.839 (95% CI: 0.836, 0.849) for the Romano/Charlson measure and 0.836 (95% CI: 0.834, 0.847) for the van Walraven/Elixhauser score (Table 5). For each measure, the c-statistic decreased monotonically with increasing follow-up for mortality. The absolute differences in c-statistics between the combined score and each of the two component scores also decreased with increasing mortality follow-up.
The observed and predicted proportions of death for each value of each measure are plotted in Figure 2 for the 1-year mortality outcome. These proportions were similar at most levels for each score, as determined by the 95% CI for the observed proportion containing the predicted proportion. The reclassification tables (Table 6 and Appendix Tables 1–3) show the number of individuals that were reclassified into new risk strata when comparing the combined comorbidity score to either of its component scores. Overall, 15.2% of individuals were reclassified from the Romano/Charlson score and also from the van Walraven/Elixhauser score, for the 1-year mortality outcome. Fewer individuals were reclassified from the two component measures for the mortality outcomes with shorter follow-up (Table 7). Both the NRI and the IDI yielded positive values for all outcomes when comparing the combined score to either the Romano/Charlson score or the van Walraven/Elixhauser score (Table 7). The NRI indicates the proportion of patients correctly reclassified by the combined comorbidity score from each of the constituent scores. Among patients who died during the follow-up year, approximately 2% were correctly reclassified by the combined score as compared to the Romano/Charlson score and the combined score correctly reclassified about 3% of those who did not die in the follow-up year. Approximately 4% and 2.5% of those who did and did not die, respectively, were correctly reclassified by the combined score as compared to the van Walraven/Elixhauser score. The IDI indicates the change in difference in average predicted probabilities between those that died and those that did not die during follow-up. The average predicted probability of 1-year mortality among those who died during the follow-up year was higher for the combined score (17.9%) than for the Romano/Charlson score (16.6%) and the van Walraven/Elixhauser score (16.3%), but the average probabilities were similar across the three measures among those that did not die during the follow-up year (6.6 for the combined score, 6.7 for the Romano/Charlson score, and 6.8 for the van Walraven/Elixhauser score).
In an independent external validation study, a single numeric comorbidity score that considers conditions in both the Romano implementation of the Charlson Index and the Elixhauser comorbidity classification system performed numerically better in predicting both short- and long-term mortality than either the Romano/Charlson score with Medicare weights or the van Walraven single numeric modification of the Elixhauser measure. Although differences in c-statistics among the 3 comorbidity measures appear small, it has been demonstrated that even slight improvements in the c-statistic for such indices can translate into measurable reductions in confounding bias . Furthermore, this potential benefit comes at no added expense since the combined score is as easy to apply as either of its constituent scores.
Results of the validation study suggest that the difference in discriminative ability between the combined score and its two component scores are larger for mortality with shorter follow-up. Factors measured more recently are likely better predictors of an outcome than factors measured in the distant past, as reflected by the decrease in c-statistic for each score with increasing follow-up time. Thus, as the ability of covariates to predict an outcome decrease, the overall discriminative abilities of different scores based on them become more similar.
Several factors contribute to the difference in performance between the combined comorbidity score and its component scores. While the populations, data, and endpoint of interest are similar to those used to derive the original Medicare weights for the Romano/Charlson score, the combined score incorporates weights derived from more recent data. Improvements in treatment and clinical practice over time modify disease prognosis. For example, the weight for HIV/AIDS from the original Medicare weights was 4 based on data from 1995 and was −1 in the new weighting scheme based on data from 2004. While the prevalence of HIV/AIDS was low in our cohorts, the change in weights may highlight the importance of periodically updating weights to reflect changes in prognosis and also of using comorbidity scores based on weights derived from data that accurately reflect practice and prognosis of a particular population in which a study is to be conducted.
The populations, data source, and endpoint that we used differ markedly from those used to derive the van Walraven/Elixhauser score. For example, van Walraven et al predicted inpatient mortality, whereas we developed the combined score using 1-year mortality. Scores that predict certain endpoints relatively well may poorly predict other outcomes . Additionally, van Walraven et al used hospital data that spanned many years (1996–2008). Accuracy in ascertaining specific comorbidities may differ when using data based on hospital records versus Medicare claims data ; additionally, the impact of changes in prognosis over time is discussed above. Finally, van Walraven et al used a different scoring algorithm and did not include age and sex in their models, which may explain the greater variability in weights in their score. Adjusting for age and sex partially adjusts for those conditions that are increasingly common in older age; thus the independent effect of these conditions on mortality is smaller. Whether the combined score can better discriminate inpatient mortality compared to the van Walraven/Elixhauser score remains to be determined. However, an interesting endeavor would be to apply the same approach used here to derive weights for a combined score based on predicting inpatient death using hospital data.
An important point emphasized by several authors [14,17] is that interpretation of weights for individual comorbidities should be done cautiously. In the combined score, hypertension and HIV/AIDS received weights of −1 because the coefficients for these conditions in the multivariable model were slightly less than zero. Obviously, this finding should not lead one to conclude that these conditions prevent 1-year mortality. Rather, presence of diagnosis codes indicating existence of certain conditions may themselves be indicators for other factors that are inversely associated with 1-year mortality or may reflect idiosyncrasies of administrative data. For example, recording of conditions that are themselves not immediately life-threatening (e.g. hypertension) may reflect the general absence of more severe conditions and thus indicate a relatively healthy individual . Such idiosyncrasies of healthcare claims data limit the direct clinical applicability of comorbidity scores derived from them.
Although the combined comorbidity score may be advantageous over existing measures, reliance on comorbidity scores alone may not be a prudent approach to control for confounding in epidemiologic studies when additional methods can be applied [1,2,26]. The extent to which conventional multivariable methods or study-specific disease risk scores or exposure propensity scores improve confounding adjustment beyond comorbidity scores warrants further study. However, it is often the case that conventional methods and study-specific variable reduction methods are impractical. Bias due to over-fitting can result from conventional multivariable methods when relatively few outcomes are available per number of covariates included in the model . Furthermore, studies that involve both few exposures and few outcomes can preclude fitting of models for both propensity scores and disease risk scores . In addition, a single numeric summary of comorbidity facilitates the modeling of interactions of comorbidity with other covariates rather than modeling interactions between covariates and all components of the comorbidity score. Thus, while study-specific considerations of confounding are important, researchers may continue to find value in predefined comorbidity scores.
Nevertheless, several limitations of the combined comorbidity score should be noted. First, we developed and validated the score in an elderly population, using Medicare claims data, and did so to predict 1-year mortality. The sensitivity of the score and its performance relative to other measures when applied to different study populations, data settings, durations of follow-up, and endpoints should be investigated. Additionally, our comparative assessment of the 3 measures is limited in several ways. Some authors have cautioned against over-reliance on the c-statistic to compare the predictive ability of different models, largely because it is insensitive to the addition of important factors in a prediction model . Thus, we also calculated several recently proposed measures of reclassification . Positive values for both the NRI and the IDI indicate that the combined comorbidity score performed better than either the Romano/Charlson score or the van Walraven/Elixhauser score. However, the NRI depends on the cutpoints used to define risk strata; thus, they should be defined a priori and should reflect clinically meaningful thresholds. Furthermore, the properties of these statistics are still being evaluated and the IDI may be less useful than other reclassification measures since small absolute changes in predicted probabilities lead to small values for the IDI  even if the changes are large on a relative scale, as can occur when outcomes are rare. In the validation cohort, 7.4% of patients died during the follow-up year and this decreased to 0.7% for 30-day follow-up.
In conclusion, we created a comorbidity score by combining conditions included in both the Charlson Index and the Elixhauser system and derived weights to predict 1-year mortality in a Medicare population aged 65 years and older using data from 2004. Based on external validation, this combined score performed numerically better in discriminating both short- and long-term mortality as compared to either the Romano/Charlson score or the van Walraven/Elixhauser score, based on the c-statistic, but results based on measures of reclassification were mixed. In similar populations and data settings, this score may facilitate better confounding control than existing measures, without any added investigator burden.
This research was supported by research grants from the National Institute on Aging (RO1-AG018833) to Dr. Glynn, and the National Library of Medicine (RO1-LM10213) to Dr. Schneeweiss. Dr. Gagne is supported by a National Institute on Aging training grant (T32-AG000158).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Joshua J. Gagne, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120, T: 617-278-0930, F: 617-232-8602.
Robert J. Glynn, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120, T: 617-278-0930, F: 617-232-8602.
Jerry Avorn, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120, T: 617-278-0930, F: 617-232-8602.
Raisa Levin, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120, T: 617-278-0930, F: 617-232-8602.
Sebastian Schneeweiss, Division of Pharmacoepidemiology and Pharmacoeconomics, Brigham and Women’s Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 02120, T: 617-278-0930, F: 617-232-8602.