|Home | About | Journals | Submit | Contact Us | Français|
To assess whether performance indicators based on administrative hospital data can be rendered more useful by stratifying them according to risk status of the patient.
Retrospective analysis of 10 years of administrative hospital data for patients with acute myocardial infarction (AMI). Four risk groups defined by cross‐classifying patient age (<75 years, 75+ years) against the presence or otherwise of at least one risk condition that predicted short‐term AMI mortality.
17 public hospitals in Queensland, Australia, with more than 50 AMI admissions annually.
21537 patients admitted through the emergency department and subsequently diagnosed as having AMI.
Systematic variation in standardised case fatality ratios. Systematic variation is the variation across hospitals after accounting for the Poisson variation in the number of deaths at each hospital. It was obtained from an empirical‐Bayes model. Case fatality ratios were standardised according to the age, sex and risk factor profile of the patient.
Systematic variation decreased monotonically across the four risk groups as case fatality increased (likelihood ratio test: χ2=8.08, df=1, p=0.004). Systematic variation was largest and statistically significant (0.375; 95% CI 0.144 to 0.606) for low‐risk patients (<75 years with no risk conditions; case fatality rate=2.0%) but was smallest (0.126; 0.039 to 0.212) for high‐risk patients (75+ years with at least one risk condition; case fatality rate=24.3%).
Analysis of data from high‐risk patients with AMI provides little opportunity to identify better‐performing hospitals because there is relatively little variation across hospitals. In such patients, older age and comorbid illness are probably more important than quality of care in determining outcomes. In contrast, for low‐risk patients the systematic variation was large suggesting that outcomes for such patients are more sensitive to clinical error. Analysing data for low‐risk patients maximises our ability to identify best‐performing hospitals and learn from their processes and structures to effect system‐wide changes that will benefit all patients.
In spite of criticisms, there is continuing worldwide interest in using administrative hospital data to construct quality indicators.1 The current issues are not so much about whether such indicators should be used but more about how to render them more useful.2 Most of the indicators based on administrative hospital data consist of outcome‐condition pairs (eg, mortality (outcome) following admission for acute myocardial infarction (condition)). Large variation across hospitals for specific indicators provides an opportunity to discern high‐performing from low‐performing hospitals. After identifying high‐performing hospitals, their structures and processes can be analysed to formulate system‐wide changes that can be applied to all hospitals to optimise quality and reduce undesirable variation. The idea is not to single out individual hospitals as “bad apples”, but to identify ways of shifting all hospitals to a level of quality observed in those with the best outcomes.3,4 This is measurement for the purpose of learning, not judging,5 and embodies Demming's approach to quality improvement as applied to healthcare.6
The present study aimed to assess whether there were differences in the amount of variation across hospitals in quality indicators according to the risk status of the patient. If so, then stratifying by risk might be a useful approach to analysing administrative hospital data because it would maximise the opportunity to identify best‐performing hospitals. We chose the outcome‐condition pair of inhospital mortality following acute myocardial infarction (AMI) for our case study because several studies have shown a more consistent association between quality of care and AMI mortality than for some other indicators based on administrative hospital data.7,8
Data on patients admitted through the emergency departments of Queensland's public hospitals, who were subsequently diagnosed as having an AMI, were obtained from the Queensland Hospitals Admitted Patient Data Collection (QHAPDC). Queensland is the north‐eastern state of Australia with a population of 4.0 million, which represents 18% of the total Australian population.
QHAPDC contains, inter alia, the demographic characteristics of the patients, the principal diagnosis, secondary diagnoses and the procedures performed. The data have been collected and stored in a consistent format since the financial year 1995/96 and we used the full 10 years of available data to 2004/05 to obtain statistically stable estimates of across‐hospital variation in mortality. Patients with AMI were identified using the International Classification of Diseases (ICD) version 99 codes 410x (for the years 1995/96–1998/99) and ICD1010 codes I21x–I22x (1999/00–2004/05). We chose the 17 public hospitals in Queensland with more than 50 acute AMI admissions annually (range 51–253); the mean for all hospitals combined was 128.
We applied two exclusion criteria: age younger than 30 years or older than 89 years, and discharge status of alive with length of stay less than 4 days. Other studies have shown these criteria to be optimal for reducing the number of false‐positive diagnoses of AMI in administrative hospital data.11
To assess whether there was more across‐hospital variation in mortality for low‐risk or high‐risk patients, we defined four risk groups based on patient age (<75 years, 75+ years) and the presence or otherwise of at least one risk condition identified in other studies11,12 (and confirmed in our data) as predicting short‐term mortality for AMI (table 11).
For each of the four risk groups, we fitted logistic regression models (with 30‐day inhospital mortality as the outcome) to adjust for age, sex and risk conditions (if these were present). Risk conditions (listed in the footnote to table 11)) were fitted as separate indicator variables and age was fitted to the models in 5‐year groups, also using indicator variables. The c‐statistics for these models ranged between 0.76 and 0.82. Statisticians generally consider that acceptable risk adjustment is indicated by a c‐statistic greater than 0.70.13
We summed the predicted probabilities of death from the logistic regression models, within each hospital, to obtain the number of deaths (Dexp) that would be expected at each hospital if it had the same casemix (ie, age, sex and risk factor profile) as the average casemix for all hospitals combined.
The standardised case fatality ratio (SCFR) at each hospital was calculated as: Dobs/Dexp; where Dobs is the observed number of deaths at each hospital. SCFRs vary about 1; if a hospital has a higher mortality rate than predicted based on the age, sex and risk conditions of its patients, then its SCFR will be greater than 1. This statistical method is a standard way of constructing performance indicators that attempt to account for differences in the types of patients admitted to different hospitals.14
Estimation of the variation across hospitals should not be based directly on the observed SCFRs because these include effects of random Poisson variation within hospitals and consequently will overestimate the variation across hospitals.15 Instead, we used empirical‐Bayes models to partition the variation in the observed SCFRs into: (1) that due to chance variation (ie, Poisson) in the observed number of deaths at each hospital, and (2) variation in the true underlying SCFRs, which we have labelled as systematic variation, and was the focus of our analysis. Systematic variation has been used and validated in several studies.4,16
The larger the systematic variation, the larger the true underlying variation in SCFR. In contrast, if the systematic variation is small and close to zero, then all the hospitals have approximately the same value for their true underlying SCFRs and any difference in the observed SCFRs is due to random variation in the number of deaths at each hospital.
All empirical‐Bayes modelling was done in the statistical package Stata (Release 9.2) using programs which implement maximum likelihood estimation and empirical‐Bayes prediction for generalised latent and mixed models (GLLAMM).17 To estimate the systematic variation for the four risk groups, we fitted four random intercept models, one for each risk group. They each had the form:
ln(Dobs) = ln(Dexp) + β0 + ξ0j
Dobs and Dexp are as previously defined with Dobs assumed to have a Poisson distribution with mean Dobs;
β0 is the mean SCFR;
ξ0j ~ N(0,SV2) is the random intercept, with the j subscript corresponding to the 17 hospitals (SV, systematic variation).
The GLLAMM program provides an estimate of SV2 and its standard error (SE). We used the delta method18 to obtain the standard error of the systematic variation (SV=
V2) and used this to calculate 95% confidence intervals in the usual way (ie, 1.96× SE). The lower 95% confidence limit is bounded at zero because the systematic variation, like any measure of variation, cannot be negative.
To test for a statistically significant relationship between systematic variation and the risk of death, we fitted a GLLAMM to the entire dataset and specified random coefficients for risk group and a random intercept and compared it with a GLLAMM with a random intercept alone using a likelihood ratio test.19 That is, we compared the model
ln(Dobs) = ln(Dexp) + β0 + ξ0j + ∑(β1k + ξkj).xk
ln(Dobs) = ln(Dexp) + β0 + ξ0j + ∑ β1k.xk
xk denotes the three indicator variables used to model the four risk groups (reference category: <75 years; no risk conditions);
β1k are the mean coefficients corresponding to the indicator variables;
ξkj ~ N(0,ψk2) are the random coefficients, with the k subscript corresponding to the three indicator variables and the j subscript corresponding to the 17 hospitals.
The primary analysis, which was powered to definitively assess how systematic variation changed with risk status, was based on 10 years of data for all 17 hospitals combined. We also conducted secondary analyses over 2‐year periods to compare the systematic variation by hospital peer group and identify best (and worst)‐performing hospitals. For the analysis by hospital peer group, the four tertiary hospitals were compared with 13 community hospitals.
To identify the best‐performing and worst‐performing hospitals (ie, low and high outliers) we used the estimates of the underlying SCFRs from the empirical‐Bayes models. Following the terminology of others,20 we refer to these estimates as shrunken SCRFs because the effect of the empirical‐Bayes models is to shrink the observed SCRFs towards the state average. The amount of shrinkage depends on the number of AMI admissions at a particular hospital. If there are fewer admissions, then there is greater shrinkage towards the state average. The technique can be thought of as using an individual hospital's rate plus the combined rate for all hospitals to obtain a better estimate of the hospital's true underlying rate. Several authors give examples of this shrinkage, which is a way of accounting for small sample sizes at some hospitals.15,21,22,23 There is no standard, widely agreed method for using shrunken SCFRs to identify outlying hospitals.24 Following Austin, we classified a hospital as a high outlier if, based on the posterior distribution, P(shrunken SCRF >1.15) >0.75; or as a low outlier if, P(shrunken SCRF <0.85) >0.75.24 Others have also used this method.15
There were 21537 AMI admissions to the 17 largest public hospitals in Queensland during the 10‐year study period that met the selection criteria. During this time the annual number of AMI admissions remained relatively stable at 2071 in 1995/96 and 2234 in 2004/05, while the crude case fatality rate decreased from 13.7% to 10.4%; the combined rate for all years was 12.1%.
The case fatality rate varied greatly across the four risk groups: from 2.0% for AMI patients younger than 75 years with no risk conditions to 24.3% for AMI patients older than 75 years with a least one risk condition (table 11).). The point estimate of the systematic variation was largest (0.375) for the group with the lowest (2.0%) case fatality rate (ie, <75 years and no risk conditions). The systematic variation in the three other groups monotonically decreased as the case fatality rate increased (fig 11).). This inverse relationship between systematic variation and the risk of death was statistically significant (χ2=8.08, df=1, p=0.004, based on likelihood ratio test comparing the model with random coefficients for risk group and a random intercept with the model with a random intercept alone).
For the secondary analysis of hospital peer groups for the two most recent years, systematic variations in each of the four risk groups were so imprecise (extremely wide confidence intervals) as to not provide any useful information. Therefore, we collapsed the four risk groups into two according to whether the patient had at least one risk condition. For all hospitals combined (last three rows in table 22),), systematic variation for the lower‐risk group was 0.359 (95% CI 0.087 to 0.632), and was larger than that for the higher‐risk group—0.121 (95% CI 0.000 to 0.285), although the difference between groups was not statistically significant (p=0.168). Similarly, when stratified by peer group the data showed the same pattern of higher systematic variation in the lower risk group, although with smaller sample sizes we could not show these results to be statistically significant (table 22).
For the analysis of outliers we also collapsed the two low‐risk groups into one (table 33).). More outliers were identified using the collapsed low‐risk group than using all patients combined. Of the 17 outliers identified in the low‐risk group, 7 (41%) were also identified in the combined group; these 7 concordant outliers represented 64% of the 11 outliers identified for all patients combined.
There was more variation in mortality across hospitals for low‐risk than for high‐risk patients with AMI. Analysis of data for low‐risk patients might therefore facilitate the process of identifying best‐performing hospitals and learning from their structures and processes to effect system‐wide change that will benefit all patients.
More low or high outliers were identified using the low‐risk group than using all patients combined. This was not surprising because the larger systematic variation for the low‐risk group means that even after taking into account the random variation related to the number of deaths at each hospital, there was still more variation across hospitals and consequently more outliers in the low‐risk group than in the other groups.
We could find only one other published analysis of variation across hospitals by risk status: a Canadian database study of AMI mortality.25 The main focus of that study was to compare the performance of bayesian and frequentist methods in identifying outliers. In a secondary analysis, the authors used bayesian hierarchical models to specify three fictitious patient profiles: (1) low risk: a man aged 50–64 years, with no additional risk factors, (2) medium risk: a patient aged 65–74 years, all of whose risk factors were set to the cohort average, (3) high risk: a woman aged at least 75 years with chronic renal failure, heart failure and cardiac dysrhythmias. For each risk profile, the probability of mortality at each of 139 hospitals was calculated. Similar to our results, they found the greatest across‐hospital variation for the low‐risk group.
Our analyses did not address any of the possible systematic errors (biases) that can occur in analyses of large administrative databases.26 For example, we had information on age, sex, heart failure and dysrhythmias, but we did not have information on potentially important variables such as smoking or obesity. Also, low‐risk patients were defined as those with no reported risk conditions and it is possible that some of these patients had unreported risk conditions. Also, there might be differences across hospitals in the diagnostic threshold for AMI, especially with the more widespread use of troponin assays.27 Because of problems such as these, analyses of administrative data are best viewed as a way of screening large routinely maintained data systems; they are not definitive.26 Instead, their aim is to suggest avenues for more indepth analyses and use the lessons subsequently learnt to improve quality of care. Along these lines, Mohammed and co‐workers have suggested the pyramid of investigation model, which considers data problems and residual differences in casemix (as discussed in the preceding paragraph) before attributing any variation across hospitals to variation in quality of care.28 Despite these limitations, hospital‐specific quality indicators are often based on administrative data2 and we were interested in how to maximise the usefulness of such data.
Another limitation of the present study is that a large sample size (ie, 10 years of data) was required to properly assess whether systematic variation decreased monotonically as risk increased across the four groups. The aim of routine quality measurement is to instigate timely remedial action in clinical care and a shorter study period is preferred. Our secondary analyses, based on 2 years of data, showed relatively more variation for the low‐risk group.
Medical errors are more prevalent among elderly patients, who are frail and more likely to have complicating, comorbid conditions.29 Although a potentially attractive strategy for reducing clinical error is to target patients at high risk of adverse outcomes, it does not necessarily follow that we have the most to learn about improving systems of care by analysing data for this group exclusively. There was so little variation across hospitals for high‐risk patients that analysing this patient subpopulation would afford little opportunity for learning to improve quality. It is likely that for high‐risk patients, factors such as advanced age and comorbid conditions overwhelm variation in quality of care in determining outcomes. In contrast, low‐risk patients showed the greatest variation in mortality across hospitals, suggesting that the outcome for this patient group is particularly sensitive to the prevalence of clinical error. Analysing quality indicators for low‐risk patients might therefore afford a better chance of identifying and learning from hospitals which have better outcomes.
This idea is an extension of the idea behind quality indicators based on diagnosis‐related groups (DRGs) with extremely low mortality.30 Applying this idea to a particular condition (ie, AMI), rather than a group of heterogeneous DRGs, might provide more specific opportunities for learning.
Analysis of administrative hospital data after first stratifying by patient risk status is a more useful method for identifying best‐performing hospitals than using quality indicators based on all patients combined. Learning from best‐performing hospitals identified in this manner might contribute substantially to our knowledge and ability to improve care for all patients.
MC did the statistical analysis. IS conceived the study. Both authors wrote, edited and critically reviewed the manuscript. MC is the guarantor.
AMI - acute myocardial infarction
GLLAMM - generalised latent and mixed models
SCFR - standardised case fatality ratio
Funding: This work was completed using resources routinely available at the authors' workplace. There was no external funding. The authors' employers had no role in the study design, analysis or interpretation.
Competing interest: None.
Ethical approval: The study was based on administrative hospital data and there was no contact with patients. No information that could identify an individual patient was used in the analysis. The Queensland Department of Health, the data custodian, did not require ethics approval for the study.