|Home | About | Journals | Submit | Contact Us | Français|
Operative surgery is a learned skill. Although some individuals are more talented than others, any intelligent, well motivated and well coordinated person can do it if properly instructed. The technical act of an operation is seldom what gets surgeons into trouble. More likely sources of criticism are matters of judgment, working relationships with colleagues and (most commonly) communication—all these in the nebulous area of 'professionalism'. When we seek to assess a surgical trainee, how can the complex mix of skills and attitudes be fairly judged?
Though we cannot neatly define the surgeon's 'job', we can at least break it down into a series of outcomes that lend themselves to assessment. On the matter of professionalism, the General Medical Council has done this in documents such as Good Clinical Practice and Tomorrow's Doctors. In many ways, however, the necessary attributes are better captured in the work of the Canadian Medical Association outlined in the CANMEDs 2000 project.1 Here the roles of any specialist doctor are expressed under the seven headings laid out in Box 1.
These seven categories facilitate the task of assessment in a useful way. The only omission in my view is 'leadership'. Leadership is about making the right strategic choices, whereas the 'manager' ensures that decisions are carried through properly—a crucial distinction.
Currently, a trainee's knowledge and certain skills are assessed by formal summative examinations, and performance in a much less direct way through the Regional Inservice Training Assessment (RITA). The candidate's actual performance as a surgeon is covered in a notorious 'yellow form' that informs the RITA process in a couple of lines. Though we are moving towards a competency or outcome based definition of what we look for in a surgeon, we still lack the supporting instruments needed for proper assessment.
The first step in this process is to devise a curriculum that comprehensively lists the qualities required, mappable back to categories such as those laid out in CANMEDs 2000. This exercise—now being undertaken through an Intercollegiate project nearing completion—will give us the great prize of a web-based curriculum for each of the nine major disciplines of surgery and will distinguish between knowledge, skills and professionalism. Although it is not yet fully compatible with an agreed outcome or competency assessment tool, this will be achievable in time.
The next move is to secure agreement on how the curriculum will be delivered (the syllabus of education and training) and on the assessment methodologies to be used. That these are seen as parallel activities is essential, since syllabus drives assessment and vice versa. Though we might prefer candidates to take a broad view of education for education's sake, the truth is that most people learn only what is necessary to pass their assessments. I cannot here go into the balance between structured learning and experience tempered by reflection. This remains a huge area of controversy, especially against the background of limited hours of working, growing service demands and a drive to reduce rather than increase training time.
In the past, the knowledge and judgment of surgical trainees was assessed by summative methods—multiple-choice questionnaires; essays, vivas or orals (until recently); and clinical exams to make a holistic evaluation of skills in patient management. Regarding the last, consultant surgeons have always valued face-to-face encounters with those who aspire to join their ranks, and the tradition of examining in hospitals and clinics will not lightly be abandoned. In practical terms, however, it is increasingly difficult to use the resources of a stretched health service for a growing burden of examinations; and, more to the point, the oral and clinical exams perform poorly in terms of reliability.
Knowledge underpins competence, so the surgeon must demonstrate mastery of relevant knowledge. The American Board of Examiners2 have looked in great depth at the issue of asking reliable and valid questions and conclude that knowledge per se and some aspects of interpretation and application (higher-order thinking) are best assessed by multiple-choice questionnaire (MCQ) techniques. Given that this huge body of work is supported across all the medical disciplines, there seems no point in surgeons going their own way. This is indeed the accepted position and an intense exercise to prepare modern and psychometrically robust MCQs is now in progress. As a result, surgeons have been liberated to concentrate on ways to assess knowledge application and the higher-order thinking involved in judgment and decision-making; and we find ourselves looking critically at traditional methods. If we are to continue with summative examinations that go beyond tests of knowledge, we need methods that offer reliability in a valid context. Norman3 points out, in a thoughtful review, that almost any assessment method can be proved statistically reliable if enough events are observed. However, to be valid the assessment must sample from across the whole spectrum of a syllabus; otherwise it will fail the test of validity. For example, clinical exams based on 'long cases', still fondly regarded by examiners, have only two observers who work by consensus rather than marking independently, and make their assessments in the context of a single clinical problem. All the candidates in any one sitting may be asked about different clinical problems. This makes the whole process highly unreliable and scarcely valid. Of course, one can make it reliable by getting many observers and reasonably valid by having numerous long cases, but the exam ends up taking eight hours.3
At undergraduate level alternatives have been around for some time. The objective structured clinical examination (OSCE) 4 and its derivatives are demonstrably reliable and valid, and yield reassuring data on quality assurance. At postgraduate level, however, we have been reluctant to extrapolate this model, largely because examiners feel the scope of the OSCE is too superficial to assess the complexities of professional practice. (Less excusably, many shun the technique because they find it boring.) We are at present exploring whether orals can be made reliable. To this end, examiners will need a good grasp of the whole concept, and we are now concentrating on 'examiner development' to improve quality in question-asking. As examiners learn to set agreed standards, there are signs of progress. Whether orals and clinicals can be made reliable without a structured approach (including OSCE-type activities) remains a vexed question, and we are open to accusations of wishful thinking. If we opted for the MCQ and OSCE model as a summative quality-assurance test, we could spend much more of our time and energy on the vital task of improving training. Moreover, assessment could be moved more into the workplace.
In a way, surgeons have always assessed trainees in the workplace—the apprenticeship tradition. But times have changed, and with the advent of shifts and limited hours this model is no longer possible even if desirable. In truth, like certain other dearly loved traditions, it fell short of the ideal—personalities got in the way, there were serious concerns about power and influence and a whole career could be made or blighted by such relationships. Nevertheless, workplace assessment does offer great opportunities if we can resolve the questions of reliability. The need, therefore, is to develop assessment tools that produce similar profiles of behaviour and performance irrespective of the observer—or, more practicably, profiles so similar that differences between observers can be dealt with statistically. The physicians have adopted such an approach with their use of Norcini's 2003 mini-CEX5 and the DOPS models; general surgery and orthopaedics have come to similar methodology independently. All the measurements are applied to everyday situations in real time—no setpieces or pseudopatients.
In the workplace, tools need to be practicable as well as reliable and valid. In all the current pilots in surgery and medicine a starting point is that the assessment encounters are brief and focused on small areas of activity. The aim is to limit the effect on a busy working hospital whilst capitalizing on the relevant environment. For example, during a surgical attachment a young trainee might agree with her trainer that at the end of the attachment she should be proficient at hernia repair. The learning set for a hernia repair, with the key stages, is clear and this information is shared. After four months of gradually doing more and more, the trainee feels ready to be assessed (or the trainee judges she is ready, after a series of formative sessions). All the learning objectives are found to have been met and after a ten-minute debrief at the end of an operation the two parties agree the trainee has demonstrated key competence. This might be repeated in a different attachment with another trainer, and gradually a body of evidence from different observers is accumulated into a growing competence portfolio. The point is that the task has been defined, the skills sought and taught and the competencies finally assessed against the goals. All this is achieved in half an hour at the beginning of training, in a series of assisted operations with formative feedback and in a final reflective assessment during and after a routine operation. How will consultant surgeons feel about such innovations? In North-East England, busy National Health Service consultants have shown themselves willing to participate, though the training-the-trainer sessions, vital to the success of these ventures, have proved expensive. What is particularly encouraging is that trainers and trainees express appreciation about having a shared framework and vocabulary on the issues of training and assessment.
We believe we now have the kernels of a system for workplace assessment of surgical trainees, of which the key elements can be summarized as follows:
Note: Professor Rowley is Director of Education, Royal College of Surgeons of Edinburgh. This editorial is based on the Stuart Lecture, given at the Royal Society of Medicine on 16 March 2004.