|Home | About | Journals | Submit | Contact Us | Français|
It would appear entirely uncontroversial to suggest that prostate cancer patients should have available information on surgeon outcomes so that they can make informed treatment decisions. We argue that release of surgeon-level data on radical prostatectomy outcomes would be premature at the current time. We point to a series of problems that would need to be addressed before we could be sure that a consumerist approach to surgeon selection would do more good than harm. These include non-standardized reporting of endpoints such as urinary and erectile function, statistically unreliable estimates from low volume surgeons and perverse incentives, such as referring of high risk patients to radiotherapy. We recommend an alternative to the “name-and-shame” paradigm of public outcomes reporting: continuous quality improvement. Surgeons are given reports as to their own outcomes on a private basis, such that no-one else can see their data. This helps to build trust and to avoid perverse incentives. Such reports must be multi-dimensional and based on a comprehensive, patient-reported outcomes system. As outcomes data are meaningless for low volume surgeons, these surgeons would have to choose between focusing on radical prostatectomy and referring patients to higher volume colleagues. Systematic research is required to determine whether such an approach would do more good than harm.
There is a wealth of information to help the consumer make even the most trivial purchasing decisions. Packets of soup list nutritional information; cars can be compared in terms of fuel efficiency, acceleration, trunk capacity and reliability; even renting a DVD can be based on a sophisticated algorithm evaluating hundreds of thousands of movie ratings. Yet a man with prostate cancer choosing radical prostatectomy has almost no basis whatsoever to choose between two surgeons. Indeed, we have heard patients citing what they explicitly ascribe to rumor in support of treatment decisions.
It would appear entirely uncontroversial to suggest that prostate cancer patients should have a similar level of information about prospective surgeons as, say, cans of soup. Yet we will argue in this paper that release of surgeon-level data on radical prostatectomy data would be premature at the current time. We point to a series of problems that would need to be addressed before we could be sure that a consumerist approach to surgeon selection would do more good than harm.
It is self-evident that proper ascertainment of outcomes is fundamental to any outcomes reporting system: miles per gallon ratings are only helpful for car buyers because we trust the government-regulated system for testing fuel efficiency. In the case of outcomes after radical prostatectomy, there is no system at all, let alone one that is widely trusted and carefully regulated. Moreover, it is known that there is important variation in the literature regarding assessment of outcomes such as complication rates, erectile function and biochemical recurrence. For instance, rates of erectile function recovery after radical prostatectomy reported in the literature vary from less than 25% to more than 90%. This is due, at least in part, to the wide variety of different methods of assessing outcome and defining potency: >17 on IIEF6; ≥26 on IIEF6; >21 on SHIM; grade 1 – 2 on a 5 point scale; grade 1 – 3 on a 5 point scale; “ability to have sexual intercourse” and so on and so forth.
Yet even if all surgeons were to use the identical same method of assessing outcome on the same follow-up schedule (say, the IIEF6 every 3 months for the first two years), it is not clear that direct comparisons between surgeons would be fair. Cars are tested for MPG using the same fuel on the same road, but surgeons treat patients who differ in terms of age, baseline function and tumor severity. Thus any comparative data would have to include careful adjustment for case mix. Risk adjustment for oncologic outcome is well established; for complications and functional outcomes there is a dearth of appropriate literature.
We have previously reported very low annual volumes for most surgeons who perform radical prostatectomy. Most commonly, urologists treated only a single case per year, with 80% of surgeons having an annual volume of 10 or fewer. It is all but impossible to get reliable estimates of outcomes from surgeons who have these sorts of low caseloads. Take, for example, a surgeon with the median annual volume of three cases who, after ten years in practice, reported persistent urinary dysfunction in 6 of 30 patients. The 95% confidence intervals around t estimate of 20% are 8% to 39%, close to a fivefold variation. Using the car buying analogy, this would be the equivalent of not knowing whether a car’s true fuel efficiency was that of a Lamborghini or a Honda Hybrid.
Small differences in surgical outcomes are well worth having. Many breast cancer patients opt for chemotherapy even for a 1 or 2% decrease in the risk of recurrence[6–7]. It is difficult to believe that men with prostate cancer would not change surgeon – something far less unpleasant than, say, doxorubicin– for the same level of benefit. But looking for small differences needs large numbers of patients.
Table 1 shows the hypothetical results of 10 surgeons, each of whom have treated 50 patients. If we had to advise a relative, we would undoubtedly make a strong recommendation that they consult with surgeon 1 or 3, and give a clear warning against surgeon 2 and 5. But as it happens, the data set was randomly created assuming that each surgeon had identical recurrence rates (20%), and, as would be expected, there is no statistical difference between the results (p=0.5). As a further thought experiment, imagine that there were only two surgeons in a city, and that we wanted to know which was better. Even if we imagine a very large difference in their recurrence rates (20% vs. 15%), we would need each to have treated about 1000 patients in order to have the traditional level of statistical confidence that one was indeed better than the other.
Public reporting of patient outcomes is a policy, and policies do not always work as intended. The classic case is prohibition, which was originally advocated, in part, as a measure to fight crime. This was not unreasonable, given the well known links between drunkenness and criminality, but in this respect prohibition was entirely counterproductive. Moreover, even the most well-meaning of policies can create perverse incentives. Indeed, there is now a considerable literature on how efforts to improve the quality of medical care can reward behavior that is not in patient’s best interests[8–10]. One well known paper is that of Boyd et al., who show that following guidelined care for an elderly patient with multiple comorbidities would result in the patient being subject to an unreasonably large number of different tests and treatments, including 12 different medicines taken in 19 doses at five different times during a typical day. The authors argue that if doctors were paid or evaluated on the basis of how closely they followed guidelines, this might create an incentive for them to give inappropriately complex care associated with an increased risk of interactions.
It is not hard to envision how reporting of radical prostatectomy outcomes could create perverse incentives for surgeons. For example, because recurrence takes time to become apparent, oncologic outcome might be reported using the more immediate endpoint of positive surgical margins. But attempts to reduce positive margins may come at the expense of functional outcomes. Urologists might also be tempted to reduce their margin rates by referring patients with more advanced stage disease to radiotherapy. This would be counter-productive, as it is exactly the highest risk patients who stand the most to gain from surgery.
If reporting of surgeon outcomes to the public is an intervention, then we should seek data as to whether that intervention is effective. The specialty that, to its credit, has done most to promote openness about outcomes is cardiac surgery. Hospital and even individual surgeon data on volumes and death rates in New York State are publically available. For example, one can look up that, between 2005 – 2007, Dr. Tortolani conducted 353 coronary bypass surgeries and experienced 4 deaths. After risk adjustment, this was pretty close to the average. However, another surgeon, Dr. Merav, had 7 deaths in 118 procedures, a rate that is a statistically significant 6 times higher than average after adjustment.
Yet there remain grave doubts as to whether programs of this sort do more good than harm. With respect to benefit, there is evidence that the implementation of statewide reporting of mortality data has had little effect on outcomes. Although there were reductions in mortality in New York state comparing the period before and after the introduction of outcome reporting, similar reductions were seen in Massachusetts, which had no public reporting. With respect to harm, data suggest that high risk patients are less likely to undergo surgery in New York compared to other states[15–16]. In a direct comparison between New York and Michigan – which, like Massachusetts, does not have public outcomes reporting - Michigan patients were more likely to undergo percutaneous coronary intervention for acute myocardial infarction and cardiogenic shock. Of particular interest, although the unadjusted in-hospital mortality rate was about 50% lower in New York, there was no difference after adjustment for case mix. The most compelling interpretation of these data is that cardiologists subject to outcomes reporting are avoiding high risk cases for fear of inflating their mortality rates and that, as a result, patients who stand to benefit from surgery are left untreated.
There is direct evidence that this is indeed the case. Narin and colleagues sent a questionnaire to New York cardiologists that directly asked about the effect of outcomes reporting on clinical decision making. Close to 80% stated that publication of mortality statistics had influenced their decision about angioplasty for individual patients, with a decreased likelihood of intervening for the patients at highest risk. An important reason was that almost all respondents (85%) believed that the method of statistically adjusting results was insufficient, such that surgeons who operated on high risk patients would appear to have worse than average results.
The New York experience is not unique in suggesting that public reporting of outcomes is of dubious value. In a randomized study of Canadian hospitals, for example, public report cards on cardiovascular care (e.g. proportion of patients given aspirin within 6 hours of arrival) had no effect on either process or outcomes of care. A systematic review of 45 studies on the public release of performance data concluded that “the effect of public reporting on effectiveness, safety, and patient-centeredness remains uncertain”.
Public reporting of outcomes data might theoretically improve patient outcomes in one of two ways: by redistributing patients between worse and better providers (“name-and-shame”) and by encouraging surgeons with sub-par results to examine their technique and procedures (“continuous quality improvement”). Our view is that continuous quality improvement can be achieved without public reporting of outcomes and that name-and-shame is inherently problematic.
Assume that table 1 reported the true recurrence rates of each surgeon. Would it really be advisable to direct patients to the surgeon with the best outcome? It is not at all clear that surgeon 3 could take on an annual caseload of 500 patients, or even 250, if cases were split with surgeon 1. Moreover, given that there are a finite number of hours in the day, other activities of surgeon 3 – such as treatment of other cancers, or research – would need to be redirected back to the surgeons who had lost radical prostatectomy volume. This might lead to an inferior surgeon treating more cases of bladder cancer, which is associated with higher rates of cancer death after surgery, and so could result in a net increase in death rates. Alternatively, surgeon 3’s research, which could potentially help thousands of patients throughout the world, might falter under the increased clinical workload. As such, it is reasonable to question whether changing practice patterns in the light of surgical outcomes data would have done more good than harm.
If our focus becomes continuous quality improvement, then we would argue that public reporting of surgeon outcomes is more of a hindrance than a help. First of all, it is unnecessary. No doubt surgeons need to know their outcomes in order to improve, but there is no reason why these data have to be shared with the public, they can be conveyed to the surgeon privately. Second, public reporting is counter-productive because, as described above, it creates perverse incentives, such as withholding treatment from the patients who have the most to gain. Third, public reporting builds distrust amongst surgeons. If I announce to the world that a particular surgeon has poor outcomes, then that surgeon has every incentive to doubt my data and my methodology. Indeed, that is exactly the experience of the New York cardiology initiative, where fewer than 1 in 6 surgeons believed the published results. If, on the other hand, the information is given confidentially (“these data are provided to you in an effort to help you improve your surgical outcomes; no-one else has access to the data or knows your results”), most surgeons will likely come to realize that ignoring the feedback is not in their own best interest.
Our view is that we would best help our patients were we first to develop reliable systems for continuous quality improvement and only then consider whether public reporting of outcomes would do more good than harm.
Surgeon outcomes can only be compared if they the methods for obtaining outcomes are comparable and outcomes are routinely obtained on all, or nearly all, patients. Thirty day mortality is a relatively straightforward endpoint that can be downloaded from national death statistics; in the case of radical prostatectomy, conversely, we need data on functional outcomes that can only be obtained directly from the patient.
The typical way to obtain patient-reported outcomes is by questionnaire. But it would be prohibitively costly to administer questionnaires to each of the many tens of thousands of radical prostatectomy patients treated each year, especially considering the vast amount of data entry and data management that would be required. We have previously argued that patient-reported outcomes need to be integrated within routine clinical care. The outcomes scientist wants to know whether the patient is continent at one year so that surgeon results can be compared; the surgeon should want to know this information as well so that an appropriate referral to a voiding dysfunction specialist can be made. We further point out that it is only feasible to collect patient-reported outcomes as a routine part of clinical care using electronic interfaces, which incur only negligible costs once they are established. Such a system has been implemented at Memorial Sloan-Kettering Cancer Center (MSKCC). In brief, patients are sent an email at regular intervals after surgery, with a link to an online questionnaire. Responses to the questionnaire are ported automatically to the patient medical record, where they can be accessed by the surgeon during follow-up care. This allows better clinical management of the patient, but also enables surgeons to receive feedback as to their results.
Continuous quality improvement requires surgeons to change both their beliefs and behavior. And surgeons, like most of us, are notoriously resistant to change: the theory of “cognitive dissonance” holds that when individuals are presented with information that contradicts their beliefs, or would require them to modify their behavior, they simply discount the information.
We have seen this effect in practice at our own institution. We arranged for surgeons to be sent data on their radical prostatectomy outcomes, case mix adjusted, and in comparison to their peers. Results were private, and surgeons were explicitly informed that no-one, not even the statisticians analyzing the data, knew whose results were whose. But when one of us (AV) then interviewed each surgeon, it quickly became apparent whether a surgeon had better or worse results: some opened the interview by stating flatly that the results should not be believed, because the outcomes ascertainment was biased and the case mix adjustment inadequate. Our view is that such resistance can be overcome, but this will take careful planning and a good deal of time and effort.
Telling someone that they are doing something wrong may not be enough; you also have to tell them what they need to do to put things right. There is evidence that performance or outcomes feedback may sometimes be insufficient to improve doctor performance and that such feedback is best complemented by educational interventions. However, it is far from clear what educational interventions might enhance performance feedback for radical prostatectomy. There are some obvious possibilities: encouraging below average surgeons to scrub in with acknowledged experts; providing surgical videos demonstrating not only best technique at specific points in the radical prostatectomy, but common mistakes as well; links to papers or other educational materials that describe best practice. Yet the efficacy or otherwise of these approaches remains to be established.
Surgeons who know that their outcomes are being evaluated may act to improve outcomes in a manner that is not in the best interests of their patients. But much of what they might do is predictable, and can be circumvented by the use of multiple endpoints. Giving surgeons feedback on their margin rates may encourage them to cut wider at the expense of functional outcomes, thus functional outcomes have to be given equal prominence. Feedback may also influence patient selection, hence patient selection itself should be an endpoint, with surgeons given feedback as to whether the patients they treat are, on average, at higher or lower risk than those treated by other surgeons.
There is a copious literature suggesting that low volume surgeons have poorer outcomes than their higher volume counterparts[22–24]. Moreover, even after adjusting for surgical volume or experience, the results of different surgeons can vary, often dramatically[22, 25]. At the very least, this means that surgeons cannot know whether or not they are any good until they see their results. It therefore would seem perfectly reasonable to insist that surgeons keep track of how they are doing and make adjustments to their technique in the light of their results. The key point is that low volume surgeons cannot engage in this sort of self-assessment. It is worth restating the example given above: a surgeon at the median US volume for radical prostatectomy – 3 cases a year – after ten years of practice, for whom there would be a five-fold variation between the upper and lower bound of the confidence interval around estimates such as positive margin rates or incontinence. It may be true that “quantity is not a marker for quality”, but unless you have quantity, you’ll never know about quality. Hence low volume urologists should be asked to make a choice: either increase your focus on radical prostatectomy at the expense of other aspects of your practice, or refer prostate cancer patients to higher volume peers. It is not straightforward for a doctor to change practice patterns overnight in order to become a high volume provider and so our guess is that most urologists would choose to give up radical prostatectomy and refer patients to specialist centers.
There are plenty of reasons to believe that giving surgeons feedback about their results will improve performance. But many similarly logical and well-intentioned interventions have failed to be of benefit, or even done more harm than good. In the case of radical prostatectomy, we need look no further than neoadjuvant therapy as something that should have worked, but did not[26–27]. There is clearly a need for carefully designed studies assessing how best to give performance feedback to surgeons, what educational interventions might complement feedback and to what degree feedback improves outcomes.
Reporting individual surgeon results seems like simple common sense; keeping those results out of public hands is redolent of corruption and secrecy. But there are good reasons to believe that public reporting may do more harm than good, providing misleading data and creating perverse incentives that can lead to poorer patient care. One of the main benefits of outcomes reporting – that surgeons can see their results in comparison with their peers, and have an incentive to improve – can also be achieved with performance feedback that is private, such that only the individual surgeon is able to access his or her results. But several steps have to be taken before even private outcomes reporting should be attempted: there has to be wide agreement on how to ascertain outcomes, which surgeons trust; feedback needs to be complemented by educational interventions to aid those with poor performance; feedback has to be multi-dimensional to avoid perverse incentives; surgeons need to practice at reasonable case volume, or not at all; there needs to be systematic research to determine whether and how feedback improves performance.
Some of these changes are quite profound: we do not currently encourage urologists with a low volume of radical prostatectomy to shift their practice to other aspects of urology, nor insist that urologists routinely collect functional outcome data on all their radical prostatectomy patients using standardized questionnaires. Yet the current system – where surgeons have almost no idea whether they are successful or not, and may simply continue to repeat the same mistakes – is unsupportable. Fundamental change is surely indicated if we want to offer the best care to our patients.
Supported in part by funds from David H. Koch provided through the Prostate Cancer Foundation, the Sidney Kimmel Center for Prostate and Urologic Cancers and P50-CA92629 SPORE grant from the National Cancer Institute to Dr. P. T. Scardino
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Andrew Vickers, Health Outcomes Research Group of the Department of Epidemiology and Biostatistics at Memorial Sloan-Kettering Cancer Center.
James Eastham, Chief of the Urology Service in the Department of Surgery at Memorial Sloan-Kettering Cancer Center.