In an independent external validation study, a single numeric comorbidity score that considers conditions in both the Romano implementation of the Charlson Index and the Elixhauser comorbidity classification system performed numerically better in predicting both short- and long-term mortality than either the Romano/Charlson score with Medicare weights or the van Walraven single numeric modification of the Elixhauser measure. Although differences in c-statistics among the 3 comorbidity measures appear small, it has been demonstrated that even slight improvements in the c-statistic for such indices can translate into measurable reductions in confounding bias [2
]. Furthermore, this potential benefit comes at no added expense since the combined score is as easy to apply as either of its constituent scores.
Results of the validation study suggest that the difference in discriminative ability between the combined score and its two component scores are larger for mortality with shorter follow-up. Factors measured more recently are likely better predictors of an outcome than factors measured in the distant past, as reflected by the decrease in c-statistic for each score with increasing follow-up time. Thus, as the ability of covariates to predict an outcome decrease, the overall discriminative abilities of different scores based on them become more similar.
Several factors contribute to the difference in performance between the combined comorbidity score and its component scores. While the populations, data, and endpoint of interest are similar to those used to derive the original Medicare weights for the Romano/Charlson score, the combined score incorporates weights derived from more recent data. Improvements in treatment and clinical practice over time modify disease prognosis. For example, the weight for HIV/AIDS from the original Medicare weights was 4 based on data from 1995 and was −1 in the new weighting scheme based on data from 2004. While the prevalence of HIV/AIDS was low in our cohorts, the change in weights may highlight the importance of periodically updating weights to reflect changes in prognosis and also of using comorbidity scores based on weights derived from data that accurately reflect practice and prognosis of a particular population in which a study is to be conducted.
The populations, data source, and endpoint that we used differ markedly from those used to derive the van Walraven/Elixhauser score. For example, van Walraven et al predicted inpatient mortality, whereas we developed the combined score using 1-year mortality. Scores that predict certain endpoints relatively well may poorly predict other outcomes [2
]. Additionally, van Walraven et al used hospital data that spanned many years (1996–2008). Accuracy in ascertaining specific comorbidities may differ when using data based on hospital records versus Medicare claims data [24
]; additionally, the impact of changes in prognosis over time is discussed above. Finally, van Walraven et al used a different scoring algorithm and did not include age and sex in their models, which may explain the greater variability in weights in their score. Adjusting for age and sex partially adjusts for those conditions that are increasingly common in older age; thus the independent effect of these conditions on mortality is smaller. Whether the combined score can better discriminate inpatient mortality compared to the van Walraven/Elixhauser score remains to be determined. However, an interesting endeavor would be to apply the same approach used here to derive weights for a combined score based on predicting inpatient death using hospital data.
An important point emphasized by several authors [14
] is that interpretation of weights for individual comorbidities should be done cautiously. In the combined score, hypertension and HIV/AIDS received weights of −1 because the coefficients for these conditions in the multivariable model were slightly less than zero. Obviously, this finding should not lead one to conclude that these conditions prevent 1-year mortality. Rather, presence of diagnosis codes indicating existence of certain conditions may themselves be indicators for other factors that are inversely associated with 1-year mortality or may reflect idiosyncrasies of administrative data. For example, recording of conditions that are themselves not immediately life-threatening (e.g. hypertension) may reflect the general absence of more severe conditions and thus indicate a relatively healthy individual [25
]. Such idiosyncrasies of healthcare claims data limit the direct clinical applicability of comorbidity scores derived from them.
Although the combined comorbidity score may be advantageous over existing measures, reliance on comorbidity scores alone may not be a prudent approach to control for confounding in epidemiologic studies when additional methods can be applied [1
]. The extent to which conventional multivariable methods or study-specific disease risk scores or exposure propensity scores improve confounding adjustment beyond comorbidity scores warrants further study. However, it is often the case that conventional methods and study-specific variable reduction methods are impractical. Bias due to over-fitting can result from conventional multivariable methods when relatively few outcomes are available per number of covariates included in the model [27
]. Furthermore, studies that involve both few exposures and few outcomes can preclude fitting of models for both propensity scores and disease risk scores [8
]. In addition, a single numeric summary of comorbidity facilitates the modeling of interactions of comorbidity with other covariates rather than modeling interactions between covariates and all components of the comorbidity score. Thus, while study-specific considerations of confounding are important, researchers may continue to find value in predefined comorbidity scores.
Nevertheless, several limitations of the combined comorbidity score should be noted. First, we developed and validated the score in an elderly population, using Medicare claims data, and did so to predict 1-year mortality. The sensitivity of the score and its performance relative to other measures when applied to different study populations, data settings, durations of follow-up, and endpoints should be investigated. Additionally, our comparative assessment of the 3 measures is limited in several ways. Some authors have cautioned against over-reliance on the c-statistic to compare the predictive ability of different models, largely because it is insensitive to the addition of important factors in a prediction model [28
]. Thus, we also calculated several recently proposed measures of reclassification [23
]. Positive values for both the NRI and the IDI indicate that the combined comorbidity score performed better than either the Romano/Charlson score or the van Walraven/Elixhauser score. However, the NRI depends on the cutpoints used to define risk strata; thus, they should be defined a priori
and should reflect clinically meaningful thresholds. Furthermore, the properties of these statistics are still being evaluated and the IDI may be less useful than other reclassification measures since small absolute changes in predicted probabilities lead to small values for the IDI [29
] even if the changes are large on a relative scale, as can occur when outcomes are rare. In the validation cohort, 7.4% of patients died during the follow-up year and this decreased to 0.7% for 30-day follow-up.
In conclusion, we created a comorbidity score by combining conditions included in both the Charlson Index and the Elixhauser system and derived weights to predict 1-year mortality in a Medicare population aged 65 years and older using data from 2004. Based on external validation, this combined score performed numerically better in discriminating both short- and long-term mortality as compared to either the Romano/Charlson score or the van Walraven/Elixhauser score, based on the c-statistic, but results based on measures of reclassification were mixed. In similar populations and data settings, this score may facilitate better confounding control than existing measures, without any added investigator burden.
WHAT IS NEW
- A comorbidity score combining conditions from the Charlson and Elixhauser measures predicts mortality better than either of the constituent scores
- Greater comorbidity summarization with the combined score can lead to better confounding control with no added investigator burden
- Comorbidity scores predict outcomes occurring in the near-term better than outcomes occurring over the long-term