All new assessment tools should be shown to have certain test characteristics, namely reliability and validity, before being put into general use. For the assessment of resident surgical skills, the ESSAT thus far seems to satisfy these criteria. In a previous study, we established the face and content validity of the ESSAT and now have data suggesting that this surgical skills assessment test has good inter‐rater reliability. This study also shows, to a limited degree, evidence for construct validity. In this context, construct validity is used to mean the ability to discriminate between residents at different levels of training.
Having now demonstrated these test characteristics, the ESSAT may be adopted by residency programmes as a new, structured and more objective method for teaching and assessing residents' surgical skills. This tool may be useful to ensure that residents reach a basic level of competency prior to entering the operating room where teaching is less controlled, and patient risks cannot be eliminated. In addition, by facilitating the use of the microsurgical laboratory and emphasising basic surgical skills, the ESSAT may be able to improve the overall process of early surgical education in ophthalmology residency and become an important part of the resident's surgical competency portfolio. For residency programmes in the United States, the ESSAT may be implemented as one step towards satisfying the demands of the ACGME and maintaining accreditation.
The primary aim of this study was to assess the inter‐rater reliability of the ESSAT. While the responses of experts were not identical, they were consistent in their overall determination of competency, which was set at a threshold of
70% of checklist items marked as correct and
3/5 on each of the global rating scale items. For each of the checklists, at least 90% of the raters were consistent with one another regarding whether the resident passed the threshold of competence. For the global rating scale, at least 80% of raters gave consistent scores with the exception of when they were rating the skin suturing station. At this station, 40% (8/19) of raters scored the junior resident above the passing threshold, and 60% (11/19) scored this resident below. One factor that may have contributed to this lack of inter‐rater reliability is that more than half of the raters who “failed” this resident also watched the video of the senior resident at this station. In fact, this video clip would have been just prior in sequence to the clip of the junior resident on the CDs that were mailed. None of those raters who “passed” this resident had also received the video clip of the senior resident. Most likely, those raters who received both clips were more critical of the junior resident because they had just finished watching the clip of the more skilled senior resident. The inconsistency of the ratings exposes the fact that the global rating scale is not wholly objective. These inconsistencies may be diminished once a rater has seen several residents perform the ESSAT and has developed a good sense of where different residents should fall on the range of the scale. Interestingly, there was no inconsistency of rating skin suturing using the checklist. This may partially be due to the fact that the forced choice, binary (correct or incorrect/not done) checklists are less qualitative than the global rating scale.
As this was the first trial run of the ESSAT, the video quality was less than perfect and there were several segments that had problems with image stability and focus. Several raters commented on this issue, which may have contributed to some of the variability in scores on any particular video segment. A few raters who submitted incomplete assessment forms noted that they did so because they could not clearly see some portion of the task. With time and experience, the ESSAT video techniques will be refined, perhaps resulting in a further increase in the reliability of ratings.
The ESSAT appears to have the ability to distinguish between a junior and a senior resident (construct validity). Overall, the senior resident was rated above the “passing” threshold 94% of the time, whereas the junior resident was only rated above the threshold 30% of the time. This finding offers some evidence of construct validity. The individual who was likely to be a more competent surgeon, the senior resident, did better on the ESSAT than the individual who was less likely to have well‐developed surgical skills. The ESSAT was not as good at discriminating the residents at the skin suturing station. This may be because of a ceiling effect for this relatively straightforward task. Even the novice junior resident was able to “pass” the threshold of competency for this task. As only two residents were being compared, further testing is clearly needed to assess the ESSAT's ability to discriminate between residents. Further testing on more residents will also help determine whether the ESSAT can pick up even more subtle differences, such as that between a resident in the first year of training and one in the second year of training or between an average and a particularly talented senior resident.
The number of raters who were mailed each different video segment was not equivalent. As a result, we have much more data regarding station three (phacoemulsification: wound construction & suturing technique) for both residents and station one (skin suturing) for the junior resident. More data need to be collected to make sure that the inter‐rater reliability and construct validity of the other three segments remain strong with larger numbers of raters. In addition, as we collect more data, we will be able to refine the “passing” thresholds to ensure that they are set at a level that is appropriate to require residents to reach before beginning training in the operating room. It should be emphasised that the ESSAT is not intended to be a stressful test that will prevent struggling residents from beginning their training in the operating room. Rather, the ESSAT is a form of quality assurance for residents to guarantee that they have adequate exposure to surgical techniques prior to beginning real‐life surgical training. Emphasis will be placed on providing constructive feedback through the specific items marked incorrect on the checklists and on facilitated and guided practice in the microsurgical laboratory.
Ensuring the surgical competency of residents is a critical component of every ophthalmology residency programme. The traditional forms of surgical skills assessment, unstructured summative faculty evaluations written at the end of a rotation and faculty meetings with discussion of residents' abilities, are inadequate. We have now collected evidence that the ESSAT is a reliable and valid new assessment tool. Complemented by other new assessment methods that will evaluate residents' skills in the operating room3,4
and with simulation technologies,5,6
the ESSAT could help guarantee that all residents achieve surgical competence during their residency training.