|Home | About | Journals | Submit | Contact Us | Français|
Despite advances in surgical training, microsurgery is still based on an apprenticeship model. To evaluate skill acquisition and apply targeted feedback to improve our training model, we applied the Structured Assessment of Microsurgery Skills (SAMS) to microsurgical fellows training. We hypothesized that subjects would demonstrate measurable improvement in performance throughout the study period and consistently across evaluators.
Seven fellows were evaluated during 118 microsurgical cases by 16 evaluators over three 1-month evaluation periods in 1 year (2010-2011). Evaluators used SAMS, which consists of 12 items in four areas: dexterity, visuo-spatial ability, operative flow, and judgment. To validate the SAMS data, microsurgical anastomoses in rodents performed by the fellows in a laboratory at the beginning and end of the study period were evaluated by five blinded plastic surgeons using the SAMS questionnaire. Primary outcomes were change in scores between evaluation periods and inter-evaluator reliability.
Between the first two evaluation periods, all skill areas and overall performance significantly improved. Between the second two periods, most skill areas improved, but only a few significantly. Operative errors decreased significantly between the first and subsequent periods (81 vs. 36; p<0.05). In the laboratory study, all skills were significantly (p<0.05) or marginally (0.05≤p<0.10) improved between time points. The overall inter-evaluator reliability of SAMS was acceptable (α=0.67).
SAMS is a valid instrument for assessing microsurgical skill, providing individualized feedback with acceptable inter-evaluator reliability. Between the first two evaluation intervals, the microsurgical fellows’ skills increased significantly, but they plateaued thereafter. The use of SAMS is anticipated to enhance microsurgical training.
Concepts of technical training differ depending on the discipline. Many trainable tasks consist of a series of definable steps for which a systematic curriculum can be designed, developed, and executed and the results measured; flight schools, for example, follow this training model. Other disciplines that tend to be more artisanal have flourished under an apprenticeship model in which the trainee is expected to absorb much of what is needed through observation and then apply a diverse skill set under a variety of circumstances; most of the arts follow this model and so, traditionally, has surgical training.
Concepts of instructional design are focused on creating “instructional experiences which make the acquisition of knowledge and skill more efficient, effective, and appealing.1 This process consists of determining the current skill level and needs of the learner, defining the goal of instruction, and designing an “intervention” to facilitate a transition. 1-2 Ideally, the surgical training process is informed by pedagogical (teaching) and anagogical (adult learning) theories and can include self-study, surgical simulation, and instructor-led scenarios. The outcome of instruction may be directly observable and scientifically measured, or completely unexamined and assumed, as has historically been the case in surgical training. 3-4
Two forces—work hour restrictions and the desire for health outcomes assessment--have resulted in a desire for a more systematic measurable approach to surgical training. Under conditions of restricted training time, concerns have arisen about not only compliance but also the adequacy of surgical training in a compressed amount of time. 5-8 In order to provide appropriate training under the auspices of the Accredited Council for Graduate Medical Education (ACGME) guidelines, a paradigm shift has emerged where the training of surgical residents is no longer limited to the hospital and the operating theater. The other force behind a shift to more systematic training is the desire for measurable quality outcomes in health care. The Institute of Medicine has made this a priority, and for good or bad, the Affordable Health Care Act will use outcome measures to drive reimbursement and resource allocation.
The need for systematic evaluation of our surgical training programs cannot be overstated. In a world that is increasingly dominated by metrics, developing a reproducible, valid, low-cost, and easy to administer evaluation instrument is essential for self-improvement. Proactive pursuit of this goal, besides being worthy in and of itself, is a critical defense against having such metrics imposed upon us from external sources, who will likely have a dimmer understanding of the principles and practice of our profession. One instrument for surgical evaluation is the Structured Assessment of Microsurgical Skills (SAMS), a model designed to assess technical skills during microsurgery that was formulated from other assessment tools such as the Imperial College Surgical Assessment Device (ICSAD) and the Observed Structured Assessment of Technical Skills (OSATS). 9--12 The SAMS is an appropriate instrument for microsurgical assessment because it covers the core components of microsurgical skills. To insure that the instrument was valid in the context of our clinical training program, we chose to validate the instrument in a controlled laboratory setting.
The purpose of this study was to use SAMS to monitor the maturation process of microsurgical skills in a cohort of microsurgical fellows over a 1-year period, with the ultimate goal of providing targeted feedback to enhance our training model. To our knowledge, this is the first study that evaluates microsurgical skill in a cohort of microsurgical trainees in a clinical setting, while simultaneously validating the assessment instrument in a laboratory model.
Seven fellows were enrolled in a 1-year (July 2010-June 2011) microsurgery fellowship at The University of Texas MD Anderson Cancer Center after completing the requisite plastic surgery residency training. The training program has been in existence for 21 years, and more than 120 fellows have completed the program. Currently, the program has 19 full-time microsurgery faculty, and it emphasizes graduated responsibility in both patient care and operative experience. Fellows participate in pre/postoperative clinic visits, operative cases, didactic education, and clinical research. Performance feedback from faculty is encouraged during each case, and formal, structured written and face-to-face feedback is provided by the program director (CEB) twice yearly. Fellows are expected to progress during the year of training and to competently perform microsurgery reconstructions independently by the end of the year.
The fellow cohort underwent an identical didactic and clinical training during the same time period at the same institution. The evaluation of their skills was based on SAMS and included clinical assessment as well as a laboratory evaluation, where conditions were standardized.
In the first of this two-part study, 7 fellows were evaluated during 118 microsurgical cases by 16 faculty members in the Department of Plastic Surgery during three 1-month intervals at the beginning (August), middle (February), and end (May) of the training year. For each of the 1-month evaluation periods, the fellows were evaluated after each consecutive microsurgical case during that month. Each fellow participated in an average of 7 (range, 5 – 10) cases per month. Fellows selected cases weekly on the basis of their preference resulting in one faculty evaluator per case. Each of the participating faculty members evaluated all the fellows by the end of the study period.
The evaluators used the SAMS, which consists of 12 items grouped into four main categories of microsurgical skills: dexterity, visuo-spatial ability, operative flow, and judgment (Fig. 1). Each category is subdivided into three technical components. The overall score is the average of all 12 items and can range from 1 to 5. Specific errors (e.g., unequal stitch bites, back walling) were documented from a list of 25 common mistakes made during microsurgical anastomosis (appendix 1).
In part two, to validate the SAMS as an evaluative instrument, 7 fellows performed microsurgical rat femoral artery anastomoses at the beginning and end of the study period (two procedures per fellow). The surgeries were videotaped, and each video was evaluated by five plastic surgeons using the SAMS questionnaire. The videos were de-identified and the audio was removed so the evaluators were blinded to the identities of the fellows, as well as the time point.
A paired t-test was used to compare changes in the SAMS score for each technical component for each fellow over the study period. Cronbach’s alpha coefficient was used to determine inter-evaluator reliability in both the clinical and laboratory settings for each fellow at each time point. Alpha values of less than 0.6 were considered unacceptable, values of 0.6 – 0.7 were considered acceptable, values of 0.7 – 0.8 were considered good, and values greater than 0.8 were considered excellent. 13 A P-value less than 0.05 was considered significant. All tests were two-sided. The analyses were performed using SAS 9.2 (SAS Institute Inc., Cary, NC) and R (The R Foundation for Statistical Computing).
Forty-four cases were evaluated in August 2010, 33 in February 2011, and 41 in May 2011. Table 1.1 lists the mean SAMS scores for each of the components during the first two study periods, August 2010 and February 2011. All of the technique component scores, as well as the overall performance and indicative skill scores, increased significantly over time with the exception of tissue handing (p=0.20), motion (p=0.08), and irrigation (p=0.08). Speed improved the most (0.76 difference in mean scores), and tissue handling showed the least improvement (0.36 difference).
Table 1.2 compares the mean scores of the components in the second and third study periods, February 2011 and May 2011. Most of the technique component scores increased slightly between these two intervals. Overall visuo-spatial ability significantly improved (diff=0.28, p=0.01), as did knot technique and suture placement (0.34, p=0.03 and 0.37, p=0.05, respectively). The scores of speed and patency, as well as overall performance and indicative skill scores, demonstrated slight, non-statistically significant decreases (0.07, 0.06, 0.004, and 0.02, respectively). A graphic representation of changes in the mean SAMS scores over all three study intervals is shown in Fig. 1.
The number of tabulated errors decreased from 81 total errors in August to 36 total errors in both February and May. The most common error was unequal stitch bites (Table 2).
Tables 3 and and44 show the pre- and post-training period scores for overall performance and indicative skills. These and all other measurable parameters in the SAMS demonstrated statistically significant increases in skill from the first to the second laboratory session. The inter-evaluator reliability of assessment for each skill component is shown in Table 5. The inter-evaluator reliability was acceptable (0.6<α<0.7) for suture placement, motion, irrigation, and bleeding control and unacceptable (α<0.5) for bimanual dexterity, knot technique, and steps. The inter-evaluator reliability was good (α>0.7) for all remaining skills. In addition to calculating the inter-evaluator reliability for each of the microsurgery skills separately, we also pooled all the skills and found that overall reliability for the laboratory component was acceptable (α=0.67).
The material cost per fellow, per anastomosis for the laboratory component was $127.91. This included $55.71 per rat, $11.20 for 10-0 nylon suture ($5.60 per pack, 2 packs per anastomosis), $61.00 for the technical fee of anesthetizing the rat, and no cost for the videotaping and evaluation.
In this study, a cohort of microsurgical fellows subjected to the same training model demonstrated statistically significant improvements in the first 6 months of the training year in all main SAMS categories of skills and overall performance. During this time, the number of technical errors decreased by more than half. We also demonstrated that these improvements and error reductions tapered off, or plateaued in the second half of the training year, although trends toward improvement were still seen. We further demonstrated that an assessment instrument can be validated in the laboratory by showing that evaluators rated videotaped microsurgical performances with relative consistency. Finally, we found that both the clinical evaluation and the laboratory component can be easily administered at low cost and with minimal inconvenience to a busy microsurgical practice and training program.
Rather than relying on an apprenticeship model of learning, microsurgical training can be broken down into components that can be evaluated, measured, and analyzed. In the clinical portion of this evaluation, fellows demonstrated significant improvement in all measurable areas over the first half of the study period, indicating that early in the fellowship, important technical experience and learning skills specific to performance of a microvascular anastomosis were being imparted, whether actively or passively. In the second half of the study period, improvement in the technical aspects of performing a microsurgical anastomosis was considerably more modest. This perhaps suggests that skill acquisition related to the microvascular anastomosis itself had plateaued, and improvements were taking place in other surgical parameters that were intentionally unmeasured by the SAMS such as flap design, surgical planning, overall operative sequence, and decision-making. This explanation would be consistent with the educational model of our training program, which concentrates on the mechanics of the anastomosis early and the subtleties of surgical planning and flap design elevation and inset later in the year.
The laboratory portion of the study confirmed that SAMS is valid for measuring microsurgical ability because (1) evaluators consistently rated fellows as more skilled at the end of the year, even though the evaluators were blinded to the time the videos were recorded and (2) evaluators were in acceptable agreement about the performance characteristics of each of the fellows in most skill areas and in each evaluation period. There are some specifics of the scoring distribution that require explanation. Most of the fellow scores occur in the three to four range, and fellows who began with a lower baseline score (e.g. fellows 1, 5 and 6) tended to improve more than fellows beginning with a higher baseline score (e.g. fellow 2 and 7). We hypothesize that fellows who enter the program at a lower skill level can improve relatively dramatically because obvious deficiencies and overtly poor technique can be corrected quickly with simple interventions (penetrate the vessel wall at a right angle, withdrawal the needle along the curve, set the knot down square, etc.). Conversely, fellows who begin at a very high level, or have improved to a high level throughout the year, require less gross instruction, and improvements will be, by their nature, smaller and more subtle. This may reflect the asymptotic nature of skill acquisition.
Another observation is that there appears to be a ceiling effect among the study participants at a score of around 4. We propose an explanation: The scoring system from 1 to 5 represents the entire spectrum of potential performances from complete novice (1) to word-class expert (5). Not surprisingly, microsurgery fellows with some technical micro experience at the beginning of the training program will achieve scores above the compete novice, but below world-class expert. It is difficult for a fellow to achieve a score of 5, just as it is difficult to achieve a score of 1. These scores are outliers. Although a good and proficient performance (score of 4), can be consistently achieved by skilled fellows by the end of the fellowship, perfect and expert performances (score of 5) are very difficult to achieve and this level of achievement may only occur after years in practice, if ever. For this reason, scores of 5 were rare, perhaps again reflecting the reality of skill distribution and the asymptotic nature of skill acquisition.
The foundation of the field of instructional design was laid in World War II, when the U.S. military had to rapidly train large numbers of enlisted men to perform complex technical tasks. Drawing on the research and theories of B.F. Skinner on operant conditioning, training programs focused on observable behaviors.14 Tasks were broken down into components, and mastery was assumed to be possible for every learner, given enough repetition and feedback. Following the war, this training model was replicated in business, industrial training, and in the classroom.15 This type of systematic approach, although still common in the military, has not been applied in any meaningful way to surgical training, in spite of the fact that the two environments are analogous in the sense of consisting of definable tasks with severe consequences if unsuccessfully completed.
In contrast to the military’s systematic approach there are few reports of novel microsurgical training approaches beyond the use of artificial tubes to simulate small vessels, in vitro animal models, and chicken parts, and even fewer studies validating the utility of these models. 16-18 Some studies have explored, with occasionally promising results, the use of simulators and surgical robots in microsurgery in an attempt to validate trainee competency with other instruments and parameters. 19,20 Although a variety of different models exist to assess surgical technique and operative skills, a validated, reliable, and reproducible assessment of the microsurgical anastomosis, per se, has not been conducted.
The strengths of this study include a robust number of trainees, evaluators, and cases and a controlled validation process for our assessment instrument. Having 7 fellows and 14 faculty members participate in the evaluation process allowed us to generate a robust body of data over a single training period. Since a 1-year fellowship is short, being able to generate these data quickly was critical to the success of this study. The same trainees and evaluators participated in both the laboratory and clinical assessments. This allowed us to rate the consistency with which the assessment instrument was being applied and validate the clinical component of the study.
There were several inherent limitations to this study. Not all evaluators rated every fellow in each interval, so there may have been some psychometric divergence among evaluators per evaluation period, even if the instrument was generally being applied with consistency. Similarly, there was not an even distribution of evaluators and microsurgical cases. All evaluators reviewed all participating fellows, but did not do so in each of the three study periods. Because of normal variations in individual faculty member’s clinical practices, the number of evaluations was greater in busier microsurgeons. The evaluators in the laboratory consisted of both busy microsurgeons and less busy microsurgeons, as well as both senior and junior faculty. The purpose of this was to insure that there was inter-rater reliability across a diverse set of faculty characteristics. This consistency gives us confidence that just because more evaluations were completed by clinically busier faculty, that this did not alter the way the instrument was utilized or skew results.
Second, every case and the circumstances of every anastomosis are different, which introduces some inevitable variance in the process being evaluated. In addition, some evaluators assessed fewer cases, which leads to missing values and disables certain types of calculations. Finally, expectations about how a trainee should evolve or progress over the course of the year might change the way the trainee is graded. For instance, a trainee may be graded more harshly for a similar performance at the end of the year than at the beginning because of expectations of improvement over that time. In spite of these challenges, the authors thought it was important to evaluate the trainees in actual clinical scenarios.
Tangible and immediate benefits of the proposed training program include the ability to provide specific, periodic feedback to our fellows, rather than anecdotal reports or arbitrary and retrospectively biased ordinal grades. The later type of comment is generally unhelpful, and occasionally hurtful, particularly when the trainee is offered no method of remediation. The next step in our maturation as a training program will be to design interventions that target the deficiencies exposed by the SAMS assessment and then evaluate the effectiveness of the interventions in generating improvement. Although we believe that this study represents a considerable step forward in validating a microsurgical assessment instrument,and providing specific feedback, we clearly have a long way to go before we reach the theoretical and actual goals of standardizing curricula, assessing learning needs, creating benchmarks, defining measurable outcomes, and creating targeted feedback and interventions.
This study is the first, modest step towards achieving those goals. Work hour restrictions and the increasing use of metrics to assess quality will bring the issue of trainee evaluation to our doorstep whether we like it or not. We are fortunate in microsurgery in that we have a set of clearly defined tasks that are amenable to this type of analysis, and it behooves microsurgery training programs to engage in this process in an effort to set our own standards, so that others do not set them for us.
The SAMS is a valid instrument for assessing microsurgical skill, with good inter-evaluator reliability. Over the course of a year’s training, microsurgical fellows’ skills increased significantly in the first half of the year, but not the second. Suture placement, knot tying, and dissection continue to show modest improvement throughout the year. The shape of this learning curve is commensurate with a microsurgical curriculum that emphasizes anastomotic technique in the early part of the training and progresses to the subtleties of flap design and surgical planning in the latter part. We have moved one step closer to the goal of establishing training objectives, evaluating skill level, identifying deficiencies, and targeting feedback and interventions to enhance microsurgical training.
Funding Sources: This research is supported in part by the National Institutes of Health through MD Anderson’s Cancer Center Support Grant CA016672.
Disclosure: The authors have no disclosures or conflicts of interest relevant to the content of this manuscript.
Products Mentioned: SAS 9.2 (SAS Institute Inc., Cary, NC) and R (The R Foundation for Statistical Computing)
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.