|Home | About | Journals | Submit | Contact Us | Français|
The Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity (POSSUM) has been proposed for use in comparative audit between surgeons and between hospitals. To assess its feasibility, POSSUM scoring was attempted on admission in all patients under the care of two consultant surgeons over a six-month period. Scores were awarded only if all investigations necessary for POSSUM were performed; investigations unnecessary for effective treatment were not performed.
815 patient discharges were recorded over the six-month period, with 521 patients undergoing operative procedures. Of those undergoing an operation, scores could be allocated in only 155 (30%).
Scoring systems such as POSSUM are procedure-based, thereby excluding those who do not undergo an operation. However, most of our operative cases were also excluded. Full POSSUM scoring will often require additional investigations. POSSUM is unlikely to be of use in the wider setting of comparative audit.
Scoring systems in all areas of medicine are receiving close attention because of the need to evaluate and monitor healthcare delivery and outcomes. The main application is in comparative audit, but an effective scoring system could be useful for other purposes—for example in research, to standardize for case mix, and in clinical management as a prognostic indicator.
In surgery various risk scores are used but a recent review article recommended the Physiological and Operative Severity Score for the enUmeration of Morbidity and Mortality (POSSUM) as the most appropriate for general surgical practice1. POSSUM was originally described by Copeland et al. in 19912 and its Portsmouth modification in 1996.3,4. Both systems take the same 12 physiological variables and combine these with 6 operative variables (Box 1). Many of the evaluations have been in surgical subspecialties; and, provided the data are correctly analysed, POSSUM has been found accurate in predicting outcomes5,6,7,8. Midwinter et al.8 comment: ‘Such equations could also be used by an individual surgeon or unit to assess his/her performance’. We have examined the suitability of POSSUM for use in comparative audit within a general surgical setting.
Over six months, all patients admitted under the care of two general surgical consultants were included. On admission they were scored by POSSUM criteria on a datasheet containing all the physiological variables. Patients transferred to the care of the firm whilst an inpatient and patients whose care was shared with another firm were included. Upon discharge the datasheet was completed by assignment of the operative scores, if appropriate. At weekly meetings of the entire two surgical firms, each patient discharged during the previous week was discussed and any inaccuracies were corrected. Completed datasheets and discharge summaries were held on a computerized database (SIS Plus, London), which was updated weekly. The hospital patient administration system (PAS) was not relied upon.
Emergency admissions were investigated as thought appropriate by the admitting team. Preoperative routine investigation policy for elective admissions in our unit is as follows:
These guidelines are in keeping with those of most surgical departments and we did not deviate from our routine practice. We included all surgical cases, so no part of the workload was unaccounted for. Any patient who did not undergo the necessary investigations could not be given a physiological score.
Over the six months, 815 patients were discharged with 27 deaths (3.3%). In total there were 390 emergency admissions and 398 elective admissions and 27 patients were internal or external transfers (see Table 1). Not all patients had an operative procedure: some merely underwent investigation or non-operative treatment. This group included all those undergoing radiological investigation and intervention since they would not receive a POSSUM score by definition—though they constitute a substantial part of the surgical workload.
521 operations were performed on 501 patients, 162 of these classified as emergency and 342 as elective. The remaining 17 operations were performed on transferred patients. The American Society of Anesthesiologists (ASA) score for these patients is shown in Figure 1 and the level of complexity of the operations in Figure 2. A complete POSSUM score was obtained in 155 patients (29.8%). Emergency admissions accounted for 72 of these and elective admissions for 68. The remainder, 15, were from the transfer group.
This prospective evaluation casts serious doubt on the suitability of POSSUM for use in audit of general surgeons or their units. The reasons for the disappointing performance seem to lie in the history of the scoring system. Firstly, in formulating the original scoring system, Copeland et al. excluded all patients who were admitted for less than 24 hours (unknown number) as well as those having trauma surgery. Thus day cases would have been excluded, as they were specifically in P-POSSUM along with children3,4. Day cases and children make up 48% of our workload. Furthermore, in Copeland's study all the patients had blood samples taken for determination of urea and electrolytes, haemoglobin concentration and white cell count, and all had electrocardiography performed. This blanket approach to preoperative investigation for inpatient treatment is not in keeping with our hospital guidelines—adding to workload with little benefit in terms of diagnosis or operative risk (for example, an electrocardiogram in a fit young adult undergoing appendicectomy). Further exclusion of those inpatients in our study who underwent operative treatment but did not have the full POSSUM work-up left a net scored population of about 30%. To improve the scoring rate, some have suggested that a test deemed unnecessary could be assumed normal and allocated a score of one, but the original reports imply that all tests necessary for a POSSUM score were performed2,3,4. In keeping with these reports, we chose not to allocate a ‘minimum’ score to those patients with an incomplete POSSUM physiological dataset.
Whilst it is true that our patients who were not scored probably represent those who underwent minor or intermediate surgery, if outcome is measured by any event other than mortality (e.g. complications) exclusion of such a large group would seriously affect analysis of the results. Also, deaths in the low-risk groups may flag important differences in care. For any comparative audit system to be effective, differences in morbidity and mortality should be especially highlighted in this group. Even if this group were included by assigning normal values for the physiological variables, both POSSUM and P-POSSUM would exaggerate the expected mortality—since the minimum risk of death with POSSUM is 1.08% and with P-POSSUM 0.20%. This low-risk group makes up the bulk of current general surgical practice, so surgeons would constantly be performing better than predicted.
Another factor is timing. For POSSUM, Copeland et al.2 scored their patients immediately before surgery, allowing for resuscitation post admission, whereas for P-POSSUM Whiteley et al. scored on admission3,4. Recent snapshot surveys based on the POSSUM dataset have been unclear about the time of scoring, and some groups have erroneously applied the P-POSSUM equation (derived from data collected on admission) to data scored preoperatively. Since preoperative intervention affects outcome9,10, ambiguity of this kind will distort the results. Even if the time of scoring were standardized to be pre-operative, a surgical firm that aggressively pre-optimized patients would not be shown to be providing better care, only to be operating on fitter patients than the average.
Finally, how does POSSUM perform in demonstrating the competence or otherwise of the surgical team? The score allocates additional points if the surgeon spills more blood and operates on the patient many times (perhaps because of complications)5. Copeland in his original paper assigns an operative score at discharge but does not clarify how this score is achieved: is it the score on opening the abdomen, and if there are multiple procedures does one enter the highest score for the factors or the one at the last operation? Supposing, for example, a man of 65 requires a left hemicolectomy for colonic obstruction. His initial physiological score is 15. Surgeon A operates with little blood loss and no peritoneal soiling—operative score 13. Surgeon B does the same procedure with greater blood loss and with peritoneal soiling, and the patient has to return to theatre with a colonic leak—operative score 30. Inserting these scores into the POSSUM equation gives an expected mortality of 5% for surgeon A and 43% for surgeon B. While POSSUM may correctly predict the expected risk of death in each scenario it is of little value in flagging up surgical failings: surgeon B appears to be operating on more technically challenging patients. Put another way, what the POSSUM dataset appears to ask is ‘Given this patient and surgeon, what is the expected outcome?’ The question that any effective system for comparative audit needs to ask is ‘Given this patient, what is the expected outcome?’ In conclusion, while the POSSUM and P-POSSUM equations are potentially accurate in predicting mortality after an operation, in our study a large proportion of surgical patients were excluded by application of our standard criteria for investigation. Use of the POSSUM dataset effectively downplays surgeon-related variables. In our surgical practice it is not suitable for comparative audit.