Hospital Episode Statistics (HES) data cover all admissions to NHS hospitals in England and contain a field that allows separate admission spells by the same patient to be linked. Data were extracted for the years April 1999-March 2000 to April 2003-March 2004. All emergency admissions in 2000/2001 were sorted by patient and dates of admission, and the first admission for each patient taken as the ‘index spell’. Patients who died at the end of this first admission were excluded. The number of further emergency admissions, first between 0 and 365 days and, secondly, between 366 days and 36 months from the date of admission of the index spell was counted for each patient. High-impact users were defined as those patients with at least two further emergencies within a year (i.e. three or more emergency admissions in a 12-month period).
The number of emergencies in the 365 days before the index spell was also calculated and added to the data set. Other variables added to the data set included:
- the Charlson index of comorbidity (originally developed for predicting 1-year mortality, giving various weights to the presence of conditions such as diabetes and malignancy) based on ICD10 diagnosis codes,9 age, ethnicity, whether the main diagnosis was an ambulatory care sensitive condition (these are listed in )
Subsequent emergency spells and deaths in 3 years of follow-up by factor relating to index spell in 2000/2001
- two area-level socio-economic indices: MOSAIC type10 and the Index of Multiple Deprivation 2004 score with areas grouped into fifths with equal population11
- the log-transformed age-sex standardized emergency admission ratio of the patient's electoral ward of residence
- the number of consultant episodes in the index spell, the sex of the patient and the source of the hospital admission ().
Covariates used to predict high impact user status in descending order of importance in explaining the variation
The patients were then randomly divided into two groups of equal size to give a ‘training’ dataset from which to develop a model to predict the likelihood of patients becoming high-impact users and a ‘validation’ dataset to test the model. Parameter estimates from the two halves of the data were compared and model fit assessed by inspecting residuals as usual.12
For patients in the training data set, logistic regression models were developed with high-impact user status as the outcome and the variables listed in (in descending order of importance to the model fit). Ambulatory care sensitive conditions were included in the model as these are thought to be amenable to intervention at primary care level.13
We calculated the standardized admission ratio for all emergencies between 2001/2002 and 2003/2004 for the patient's electoral ward of residence, standardizing by age and sex, to try to adjust for differing admission thresholds in patients' local hospitals. Two other area-level variables were added, based on the patient's postcode of residence (lifestyle group and deprivation fifth). In addition, age, sex, ethnicity and where the patient was admitted from were included in the model.
We defined this model as model A and created two further models in addition to this. The first of these additional models (model B) restricted the analysis to index spells where the main diagnosis was for a condition most amenable to case management, again aiming to predict at least two further emergency admissions in the subsequent year. The covariates used were the same as in model A.
The third model (C) was the same as model A with the important difference that it aimed to predict patients having at least two further emergency admissions within 365 days of the index admission but who did not die during this period. To ensure that all deaths were included, and not just those taking place in hospital, we used a linked mortality file, which assigns a date of death to each patient record, based on a linkage with Office for National Statistics death registrations. Patients were followed up for 3 years using HES and the linked mortality file for 2000/2001 to 2003/2004 to obtain the number of subsequent emergency admissions, both total and for conditions most amenable to case management, and whether they died or not during this period. The total tariff for each admission was derived using the Healthcare Resource Group (HRG, the basis of remuneration to the hospital for the cost of the admission) for that admission and 2005 tariffs, adjusting for the hospital-specific market forces factor and assigning to those HRGs not yet covered by the tariff a value equal to the average for admissions for the HRGs that are covered.
For patients in the validation data set, we compared the actual high impact user status, i.e. whether each patient went on to have two or more emergencies within a year, with whether their predicted probability of being a high impact user from the logistic models derived from the training data set exceeded one of three threshold values. We calculated 2 × 2 tables for each threshold with statistics for sensitivity, specificity and positive predictive value, based on the total number of index spells. This is analogous to comparing a potential new screening test for a disease with a gold standard; here, the ‘gold standard’ is the actual high-impact user status and the new ‘test’ is whether the patient's modelled probability exceeds a given threshold value. The receiver operating characteristic (ROC) c statistic is widely used to summarize a model's ability to correctly discriminate between outcomes such as whether the patient died. A value of 0.5 suggests that the model is no better than chance in predicting death. A value of 1.0 suggests perfect discrimination. In general, values less than 0.7 are considered to show poor discrimination, whereas values above 0.8 suggest very good discrimination.
Any level of probability threshold chosen would be arbitrary, but for illustration we chose three thresholds that resulted in the identification (‘flagging’) of England totals of 250 000, 150 000 and 50 000 patients at high risk of becoming high-impact users. We chose the figure of 250 000 because the Department of Health for England has entered into a public service agreement with the Treasury to reduce emergency bed use by 5% in 2008; this has led to an emphasis on case management of 250 000 ‘very high intensive users’.14
With 303 current primary care trusts in England, this corresponds to an average of around 825 patients per primary care trust. Our other chosen totals correspond to around 500 and 165 patients per primary care trust, respectively, and might represent more manageable caseloads for community health services and primary healthcare professionals.