Analysis of the grades awarded demonstrated that there is a significant difference in the mean marks awarded by the supervisors and second markers, with the supervisors marking nearly half a grade higher than the second markers. The correlation was also modest between these markers' assessments of the reports suggesting that the two groups of markers were not using the same criteria to reach their decision, despite being provided with descriptors and a mark scheme. It is important to note that most supervisors were also second markers. At the same time they were assessing their own students' project, and so had a direct and simultaneous comparison. Therefore, the same individual appeared to use different criteria depending on whether they marked their supervised student's report or others. The lack of significant difference between the mean marks awarded by the second marker and the control marker suggests that they were awarding the same range of grades overall but the modest correlations indicate that in the case of individual students there was again significant inter-marker variability. Control markers, unlike supervisors and second markers (who may only supervise one project a year) have experience of reviewing large numbers of SSM reports. There was also a significant difference in the mean marks awarded by supervisors for performance and for written reports but in this analysis there was a much higher correlation between the marks. However, further analysis of this finding by linear regression failed to demonstrate an undue influence of the performance mark on that of the report.
Although we have been unable to provide evidence that the supervisor's mark for performance has an undue influence on the mark for the written report (halo effect), we have demonstrated that the supervisors mark significantly higher than second markers, suggesting a leniency effect. This indicates that the supervisor's mark is influenced by having known and worked with the student. Such effects have been demonstrated before in many forms of education [4
]. Some of the factors contributing to this may include insight and therefore sympathy for the student's difficulties in performing the project; inability to be objective when the student has become part of the work team; unwillingness of the supervisor to acknowledge that a piece of work emanating from his team is poor quality, or lacking the confidence or courage to feed back personally a bad assessment to the student. These factors need further exploration.
Increasingly in medical education supervisors are expected to summatively assess their students [9
]. Assessors are unlikely to be affected equally by leniency and halo effects and this will advantage some and disadvantage others among their students. These effects are likely to be strongest on supervisors who, like some of those in our study, are assessing a relatively small number of students and are inexperienced in assessment [6
]. If we are to continue to use supervisor-based assessments we must find ways to combat these effects. Other authors' suggestions for improving objectiveness and partially overcoming halo and leniency effects include detailed marking sheets [6
], training for assessors in providing feedback of assessments [5
], and also providing feedback on assessors' marking performance [6
We are aware that the marking scheme in Figure , while structured, still permitted a fair degree of interpretation by examiners. Since carrying out this project we have introduced more detailed marking schemes with specific questions and detailed descriptors for each level of achievement for assessing the students' performance and report. This now includes an assessment of how the student overcame any problems which arose and how this may have affected the outcome of the project. We have also provided more detailed guidance to markers. We intend to review the inter-marker variability in light of the increased guidance given to markers.
These findings raise the ethical question as to whether or not we should continue to utilise supervisors in this assessment process. We are planning to continue to use supervisors as markers because of the expertise they bring to the specific field of study and their realistic expectation of the difficulties encountered by the student during the course of the project. Also the supervisor is sometimes the only person capable of marking the student's performance, which we consider a very valuable assessment of the students personal and professional abilities. We do realise that this is a difficult responsibility for supervisors. Better staff development of supervisors as markers and a more detailed marking schedule may help ensure appropriate marks for performance. Furthermore, we will also consider introducing 360 degree assessment to include all members of staff who have interacted with the student, particularly to improve formative feedback to students.