|Home | About | Journals | Submit | Contact Us | Français|
To develop and validate a disease-specific automated inpatient mortality risk adjustment system primarily using computerized numerical laboratory data and supplementing them with administrative data. To assess the values of additional manually abstracted data.
Using 1,271,663 discharges in 2000–2001, we derived 39 disease-specific automated clinical models with demographics, laboratory findings on admission, ICD-9 principal diagnosis subgroups, and secondary diagnosis-based chronic conditions. We then added manually abstracted clinical data to the automated clinical models (manual clinical models). We compared model discrimination, calibration, and relative contribution of each group of variables. We validated these 39 models using 1,178,561 discharges in 2004–2005.
The overall mortality was 4.6 percent (n=58,300) and 4.0 percent (n=47,279) for derivation and validation cohorts, respectively. Common mortality predictors included age, albumin, blood urea nitrogen or creatinine, arterial pH, white blood counts, glucose, sodium, hemoglobin, and metastatic cancer. The average c-statistic for the automated clinical models was 0.83. Adding manually abstracted variables increased the average c-statistic to 0.85 with better calibration. Laboratory results displayed the highest relative contribution in predicting mortality.
A small number of numerical laboratory results and administrative data provided excellent risk adjustment for inpatient mortality for a wide range of clinical conditions.
Comparison of health care outcomes is of interest to both the clinical community and public (Halm and Chassin 2001; Fonarow and Peterson 2009; VanLare, Conway, and Sox 2010). New funding for comparative research from the American Recovery and Reinvestment Act of 2009 (U.S. Congress), coupled with the health care reform, has generated renewed interest as well as concern about methods of comparative effectiveness research and performance reporting (Fonarow and Peterson 2009; Gibbons et al. 2009;). When comparing health care outcomes in large populations, clinically credible risk adjustment methodology that can be implemented on a large scale at low cost is important. Although clinical trials are the standard method of assessing health care effectiveness, they have high data collection costs, tend to be conducted on relatively small and homogeneous patient populations, and are not practical for all types of research. As a complement, observational studies enable large-scale investigations of outcomes, which may be more applicable to real world settings (VanLare, Conway, and Sox 2010). The observational studies have been further advanced by the development and proliferating use of technology that enables electronic capture of clinical data. A 2008 survey on representative U.S. hospitals found that 77 percent had fully implemented and an additional 14 percent had been partially or were in the process of implementing electronic laboratory reports (Jha et al. 2009).
Recent publications demonstrated that automated laboratory data offer clinical credibility, objectivity, parsimony, and cost-effectiveness for risk adjustment (Jordan et al. 2007; Tabak, Johannes, and Silber 2007; Escobar et al. 2008; Render et al. 2008;). Laboratory data were found to contribute most in predicting mortality among demographics, comorbidities, and other groups of variables (Tabak, Johannes, and Silber 2007; Escobar et al. 2008; Render et al. 2008;). However, existing studies either did not assess contribution of additional clinical data, such as vital signs, in predicting mortality (Escobar et al. 2008; Render et al. 2008;) or limited patient population to primarily male and ICU patients (Render et al. 2008). Tabak, Johannes, and Silber (2007) developed and validated disease-specific mortality predictive models, evaluating both cumulative and relative contributions of laboratory data in relation to demographics, administrative, and other manually collected clinical data. Their analysis, nevertheless, was limited to only six common clinical conditions. In a large patient population using disease-specific modeling, we sought to extend the previous work to a broad array of clinical conditions by addressing whether promising laboratory results observed in a few common clinical conditions are reproducible for other less frequently studied conditions. We further evaluated the value of manually extracted vital signs and mental status data in model predictive ability in relation to electronically captured laboratory results, demographics, and diagnosis-based administrative data. Because health care data are complex, prioritizing the electronic capture and utilization of the most standardized data elements for population-based research seems prudent. In addition to numerical laboratory results, vital signs are also objective and quantitative. Hence, determining the value of vital signs in risk adjustment may inform policy makers regarding the relative importance and priority of electronic data capture, storage, and transmission, given the federal government's commitment to invest billions of dollars in the coming years to encourage the widespread adoption of health information technology in the United States (Blumenthal 2010).
We used one of the Clinical Research Databases from CareFusion (Formerly Cardinal Health Clinical Research Database [Clinical Research Services, Marlborough, MA]). This database has been used for research since the late 1980s and the data collection system has been fully described elsewhere (Iezzoni and Moskowitz 1988; Silber et al. 1995; Fine et al. 1997; Kollef et al. 2005; Aujesky et al. 2006; Shorr et al. 2006, 2009; Pine et al. 2007; Tabak, Johannes, and Silber 2007; Hollenbeak et al. 2008; Tabak et al. 2009; Weigelt, Lipsky, and Tabak 2010). The current study population consisted of 1,271,663 discharges in 2000–2001 from 217 hospitals for the derivation cohort and 1,178,561 discharges in 2004–2005 from 191 hospitals for the validation cohort.
The study database included imported hospital administrative data that was comprised of demographics, principal diagnosis, and up to 25 secondary diagnosis codes. The database also contained electronically imported or manually abstracted laboratory data, vital signs, and other clinical findings. The derivation and validation cohorts had similar laboratory data completion rates. A total of 96 percent of patients had laboratory data on the day of admission. For the 2 percent of patients who did not have laboratory data recorded on admission day, data collection extended to 30 hours after admission. For surgical patients, laboratory data were eligible before surgery starting time if surgery was within the admission window. If surgery was later than the admission data collection window, data collected within the admission window was used. About 2 percent of cases were recorded as missing laboratory data for the specified data collection window. For patients with multiple laboratory assessments on admission day, the worst value was collected.
For this study, we selected 39 major disease groups based on volume of admissions and associated inpatient mortality rate. These disease groups covered clinical conditions of all major organ systems, including the nervous, circulatory, digestive, hepatobiliary/pancreatic, musculoskeletal, metabolic, and kidney/urinary systems, as well as infectious diseases. Patients were classified into one of these mutually exclusive disease groups based on their principal diagnosis. Each patient had only one principal diagnosis for a given admission.
We first developed 39 automated clinical models—one for each disease group—using demographics, numerical laboratory findings on admission, principal diagnosis subgroups based on ICD-9 codes, and chronic conditions based on secondary diagnoses. For each disease group, we examined the distribution of each continuous variable in relation to in-hospital death. We partitioned each continuous variable into multiple discrete levels. A category for patients with missing laboratory data was created and the mortality of this group was compared and pooled into a reference group (Pine et al. 2007; Tabak, Johannes, and Silber 2007;). This approach allowed us to use data on all the patients and is more practical for large-scale implementation than imputation or dropping patients with missing data. All candidate variables that were statistically associated with mortality (p<.05) were included as potential covariates. Variable selection in multivariable regression models was based on clinical plausibility and statistical significance.
We added manually abstracted clinical data to the automated clinical models for the 39 manual clinical models. We only considered vital signs (systolic blood pressure, diastolic blood pressure, respiration, heart beat, and temperature) and altered mental status, which was assessed by the Glasgow Coma Scale or a designation of disoriented, stupor, or coma as charted by the attending physicians. We did not include other manually collected clinical variables beyond vital signs and mental status, because previous studies have found that the contribution of these variables to model discrimination is negligible (Tabak, Johannes, and Silber 2007; Hollenbeak et al. 2008;).
We compared changes in c-statistics when vital sign and mental status variables were added to the models. Because the c-statistic may be insensitive in distinguishing between models on calibration and the traditional Hosmer–Lemeshow χ2 test is not suitable when the sample size is very large, we evaluated the change of model calibration using joint distributions of predicted mortality risk by the two sets of models (Cook 2007). This method allowed us to evaluate whether models with manually extracted data would more accurately stratify individuals into higher or lower mortality risk strata compared with models without these data.
We validated each model internally using bootstrapping in the derivation cohort by sampling with replacement for 200 iterations (Efron and Tibshirani 1993). Variables that never changed coefficient signs and were significant in more than 70 percent of reiterations were retained in the model. For external validation, we recalibrated all models using 1,178,561 cases discharged in 2004–2005 because of the significant decrease in in-hospital mortality observed across years.
We examined changes in the model-fit log-likelihood value when each group of variables was retained and removed from the full model (Escobar et al. 2008; Render et al. 2008;). We calculated the relative contributions of age, laboratory results, ICD-9 code-based variables, and additional manually abstracted variables for each model.
Large-scale implementation of a clinical risk adjustment system requires cost efficiency. Because electronic capture of vital signs and mental status may require more comprehensive implementation of electronic medical records, which is currently less available compared with electronically captured numerical laboratory data (Jha et al. 2009), we evaluated whether models without vital signs and mental status (automated clinical models) can serve as surrogates for models with these additional data that currently requires manual extraction for the majority of hospitals. Specifically, we fit two sets of hierarchical models to compare hospital performance (Normand et al. 1997; Tabak, Johannes, and Silber 2007;). First, we obtained hospital ranking for each disease group using risk-standardized mortality rates generated from automated clinical models. Second, we obtained another set of ranking using manual clinical models. We used the Spearman rank correlation coefficient to assess the agreement. A high-level agreement between the two sets of results would suggest that the automated clinical models can be used as surrogates for the manual clinical models.
Overall, the median (interquartile range) age was 72 (57, 81) versus 71 (56, 81) years for the derivation versus validation cohorts, respectively. Approximately 45.4 percent of both cohorts were men. A total of 50.3 versus 58.3 percent of cases were from teaching hospitals and 15.6 versus 14.7 percent were from rural hospitals, respectively, for the two cohorts. The overall mortality was 4.6 percent (n=58,300) for the derivation cohort and 4.0 percent (n=47,279) for the validation cohort. Table 1 displays the distribution of patients and mortality by disease group for the derivation versus validation cohorts.
The most common mortality predictors across disease groups included age, albumin, BUN or creatinine, arterial pH, white blood cell (WBC) counts, blood glucose, sodium, hemoglobin, and other abnormal metabolic, or hematologic parameters (Table 2). The most common chronic conditions predicting mortality included metastatic cancer or cancer of major organ systems. The overall results were similar in the recalibrated validation cohort.
The average c-statistic for the automated models was 0.83 for the derivation cohort (Table 3). The addition of vital signs and mental status increased the average c-statistic to 0.85. It also improved model calibration when predicted mortality risk strata were evaluated in the joint distributions (Table 4). Models with vital signs and mental status reclassified 17.3 percent of cases into risk strata that were more accurate representations of observed mortality risks. For example, 57,483 cases in the 1–5 percent mortality risk stratum were reclassified into <1 percent mortality risk stratum, which was a more accurate representation of observed mortality of 0.7 percent for these patients. It should be noticed that 96.7 percent of reclassified cases shifted only to the immediately adjacent stratum.
Overall, the laboratory variables contributed most in predicting mortality with an average relative contribution of 43.2 percent across all 39 models in the derivation cohort (Table 5). The next highest contributor in predicting mortality was age (17.4 percent). The ICD-9 code base comorbidities, vital signs, and altered mental status each contributed about 10 percent and ICD-9 principal diagnosis group contributed about 7 percent. The results for the validation cohort were very similar.
The hospital performance ranks generated by automated clinical models were highly correlated with those generated by manual clinical models. The average Spearman rank correlation coefficients on hospital performance ranking was 0.97 for both derivation and validation cohorts.
A clinically credible and low-cost risk adjustment system is important for comparative outcome studies and performance reporting. Numerical laboratory data are objective, precise, and parsimonious when used for risk adjustment. The finding that the same small set of numerical laboratory results can serve as the basis for excellent predictions of inpatient mortality for a large, diverse set of clinical conditions further opens the way to collect these data on all hospitalized patients for whom the tests are clinically indicated.
Why are the numerical laboratory results important in predicting mortality? Biomarkers such as serum chemistry, blood cell counts, blood gas, and other metabolic and hematologic parameters provide objective assessments of organ system function. They minimize variations in assessments of clinical conditions of patients and eliminate over- or under-coding issues that are of concern for variables based on diagnosis codes. Secondly, the numerical laboratory data have a “dose–response” relationship with mortality outcome; the farther the laboratory result deviates from the reference level, the higher the risk of death. This graded quantitative effect leads to more accurate differentiation of organ system dysfunction than dichotomous variables captured in diagnosis codes. Third, a set of two dozen numerical laboratory tests encompasses assessment of major organ system functions that are needed to keep patients alive, making the use of laboratory data for risk adjustment parsimonious and efficient from both scientific and economic perspectives. From a clinical perspective, some deranged laboratory findings might not have one-on-one corresponding code to capture the complete spectrum of clinical complexity seen in laboratory results. For example, an abnormally low albumin could indicate chronic malnutrition, liver failure, renal dysfunction, secondary manifestation of cardiac dysfunction, or even acute severe sepsis possibly due to capillary leakage of albumin. Identification and classification of diagnosis codes to cover the broad spectrum of clinical conditions might be more arduous than directly using the laboratory test results themselves.
Our study built on previous studies on automated laboratory data (Tabak, Johannes, and Silber 2007; Escobar et al. 2008; Render et al. 2008;) by extending previous research to both male and female patients admitted for a broad range of diseases in a diverse group of acute care hospitals in terms of teaching status, bed size, and rural location. We used a large database consisting of administrative, numerical laboratory, and manually collected clinical data. We found that laboratory data contributed most in predicting mortality even when we included manually collected key clinical findings of vital signs and mental status. Although the absolute value of the relative contribution of the laboratory data was slightly smaller in our study compared with previous studies that did not include additional manually collected clinical data as covariates (Escobar et al. 2008; Render et al. 2008), our finding is consistent with previous publication that included additional variables in the models and used a different statistical method to calculate the relative contribution of laboratory data (Tabak, Johannes, and Silber 2007). These findings further validated the stability and consistency of objective and numerical laboratory data when applied to different patient populations across a wide array of disease groups and over time.
We found that vital signs and altered mental status added an average 0.02 in c-statistic above and beyond automated models. The small cumulative c-statistic increase was in line with a previous study on eight clinical conditions (Pine et al. 2009). However, adding vital signs and mental status improved model calibration in joint distribution analysis, which was not investigated previously. The incremental improvement in calibration might be particularly meaningful if risk stratification is of the interest (Cook 2007). Furthermore, these physiologic variables have clinical face validity as mortality risk factors. With the increasing use of full electronic medical records, automated collection, storage, and transmission of voluminous vital signs will likely become more practical. Hence, our findings may have policy implications for setting the next priority of inclusion of electronic clinical data for health services research.
The finding that altered mental status contributed more than laboratory data in predicting mortality among patients suffering from neurologic disorders such as ischemic and hemorrhagic stoke patients is clinically plausible. From a clinical perspective, laboratory results do not necessarily capture neurologic function. Further study on using “present on admission” (POA) for ICD-9 diagnosis codes indicating “coma” may shed light on a practical way to electronically capture and utilize the information of altered mental status on admission for risk adjustment, especially for diseases of the neurological system. It should be noted that current coding conventions may preclude coding of signs and symptoms that are “integral part of” or “associated routinely with a disease process” (CMS 2009). Our finding on the importance of altered mental status in risk adjustment may aid responsible parties to discuss and consider clarifying and modifying rules so that clinically important signs and symptoms, such as “coma,” can be consistently coded across hospitals.
Adoption of a full electronic medical record system that enables interhospital collection, storage, and transmission of vital signs and mental status throughout the United States will likely take time. Hence, a system using data that is already captured electronically across a vast majority of hospitals would have practical value. Our analysis showed that hospital performance ranks generated by automated clinical models (numerical laboratory data and administrative data) were highly correlated with those generated by models with additional vital signs and mental status. As a bridge, hybrid models incorporating the most widely automated numerical laboratory results and information from administrative data may serve as a reasonable intermediate step for aggregated performance reporting.
Our study has limitations. It may be debatable on how to best group a heterogeneous patient population into clinically homogeneous subgroups. Currently, there were multiple clinical grouping systems (Pine et al. 2007; Tabak, Johannes, and Silber 2007; Escobar et al. 2008; Elixhauser, Steiner, and Palmer 2010;). The Clinical Classifications Software (CCS) recently updated by the Agency for Healthcare Research and Quality consists of 285 diagnosis groups (Elixhauser et al. 2010). Although the CCS system offers granularity of grouping patients into homogeneous diagnosis-related groups, it would require even larger database than we currently have to insure adequate number of cases and outcome events for model development and validation, especially for those low-volume disease groups. Implementing a more granulated disease grouping system, such as the CCS, for clinical risk adjustment modeling may be achievable in the future if a nation-wide automated clinical database is established for health services research. Our study provided further evidence in support of establishing such a national database to advance health services research.
The methodology surrounding the use of numerical laboratory data in risk adjustment modeling also varies. Our disease-specific modeling approach encompassed three phases. (1) It was based on review of disease-specific risk adjustment tools for a general inpatient population published by the clinical community, which showed differences in variable selection and weight of the same variable in different risk adjustment models for patients hospitalized for different clinical conditions (Goldman et al. 1996; Fine et al. 1997; Fonarow et al. 2005; Aujesky et al. 2006; Tabak, Johannes, and Silber 2007; Tabak et al. 2009;). (2) Our empirical review of the distribution of each variable in relation to the outcome by disease group showed significant differences across disease groups. For example, for patients with WBC counts (109/L) of ≤4.3, 4.4–10.9, 11.0–14.1, 14.2–19.8, or ≥19.9, the corresponding observed inpatient mortality was 7.8, 3.6, 4.5, 5.8, or 7.6 percent if pneumonia was the principal diagnosis whereas, for the same laboratory findings, the corresponding mortality for patients of chronic obstructive pulmonary disease (COPD) was 1.7, 1.7, 2.7, 4.0, and 5.5 percent. These data revealed that neutropenia (WBC≤4.3 [109/L]) was associated with the highest mortality risk for pneumonia patients, but not for COPD patients, for whom, the mortality was about the same, the lowest (1.7 percent), whether their WBC was below or in the normal range. (3) The feedback from our clinical advisory panels preferred disease-specific models for easy understanding of risk factors and their relative weights pertinent to caregivers' specialties. Our finding that risk factors and their relative weights vary depending upon the clinical conditions being considered supports this viewpoint.
The disease-specific modeling approach differs from generic modeling approaches used by other researchers. These approaches include APACHE IV (Zimmerman et al. 2006) and the Kaiser Permanent risk adjustment systems (Escobar et al. 2008), for which a generic physiological score using numerical variables was devised first and then an aggregated physiology score was reentered into the multivariable model with other variables, including disease groups. The generic method has merits. It requires only a reasonably sized database for model development and validation and it might be easier to dissimilate and implement. In contrast, development and validation of a disease-specific risk adjustment system requires a very large database and the application of such a system may necessitate the incorporation of more complex electronic systems. Although a direct comparison of these two modeling approaches from statistical perspectives might be interesting, it is beyond the scope of the current study. Perhaps more pertinent to health services research is the fact that both modeling approaches yielded convergent results on the importance of numerical laboratory and vital sign data in risk adjustment, which provide compelling evidence for policy makers in setting priority of health care information technology in capturing and utilizing these numerical data.
Our comorbidity variables using secondary diagnoses did not reflect the recent coding change of identification of acute clinical conditions POA. Future studies may further examine directly consistency, reliability, and validity of POA coding in the administrative data as well as the relative contribution of these new data in relation to electronically captured numerical laboratory and vital sign data when they all become widely available. When evaluating the value of these data, it is important to balance objectivity, parsimony, and cost, in addition to statistical performance.
A small number of laboratory findings provide objective, quantitative, and parsimonious measures of the risk of inpatient mortality in a large array of clinical conditions. Clinical models using electronic numerical laboratory and administrative data can be used for population-based comparative outcome studies and hospital performance reporting. Vital signs and mental status should be included in the automated risk adjustment systems when the electronic collection, storage, and transmission of these data become widely available. Based on automated data, these models are cost-efficient to implement as a risk adjustment system.
Joint Acknowledgment/Disclosure Statement: An abstract based on preliminary results of this manuscript was selected as one of the “most outstanding” abstracts and was presented at the Academy Health 25th Annual Research Meeting on June 9, 2008, Washington, DC. The slides presentation was posted at the Academy Health website.
All authors were previous employees at Cardinal Health. Y. P. T., X. S., K. G. D., and R. S. J. reported current employment at CareFusion. S. G. K. reported current employment at Massachusetts Peer Review Organization. R. S. J. also reported employment at the Division of Gastroenterology, Brigham and Women's Hospital and Harvard Medical School.
We would like to thank Linda Hyde at CareFusion for her technical support. We acknowledge many helpful and constructive comments from the two anonymous reviewers.
Additional supporting information may be found in the online version of this article:
Appendix SA1: Author Matrix.
Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.