This study describes the tools residency programs are using to evaluate trainee competence. We find that programs are using a large number and variety of tools to evaluate the competence of their trainees, averaging 4.2–6.0 tools per competency. This finding is consistent with ACGME guidelines, which encourage programs to use a comprehensive system of evaluation employing more than 1 tool, as no individual competency can be thoroughly judged utilizing just 1 instrument.1
Foundational Tools are the most popular instruments for resident competency assessment and include rating-based forms such as the ABIM end of rotation form (used by a low of 77% of programs for Systems-based Practice and a high of 81% of programs for Patient Care) as well as “home grown” local forms (range of use 49–55%), and the In-Training Exam (range of use 24–91%).
It is worth noting that the ABIM form was the single most popular tool for all 6 competencies. However, global rating forms are only recommended by the ACGME to evaluate the competencies of Practice-based Learning and Patient Care. Furthermore, for both of these competencies, rating forms are only considered a “potentially applicable method.”10
Global evaluation forms provide a retrospective subjective assessment usually at the end of a clinical experience rather than an objective measure of specific skills and tasks. Ease of use may explain their popularity despite documented problems with discriminatory ability, reliability, and validity.11–14
Given the current disconnect between actual practice and recommended practice in using global rating forms, we suggest either a “culture” change by programs in their use of rating forms or added help by the ACGME to improve the psychometric quality of these forms and train raters on their proper use.
Non-faculty perspectives such as peer and nurse evaluations were also commonly used. Although we did not specifically ask, we assume local forms, peer, and nurse evaluations are all scale-based rating forms that have not been psychometrically validated. In addition, there appears to be heavy reliance on “other” methods of competency assessment (used on average by 21% of programs).
More than 10 years ago, it was predicted that direct observation of trainees would prevail over rating scales as a means of evaluation in graduate medical education.11
Our study does not support this notion. Direct observation of trainees occurred via the mini-CEX quite often (used by 90% of programs to assess Patient Care), but very infrequently for the Standard Patient/OSCE (range of use 13–28%), video of patient encounters (range of use 4–17%), or simulations (2–6%). In a similar way, Practice- and Data-based tools such as chart-stimulated recall (range of use 5%–16%) and portfolios (range of use 21%–34%) were infrequently used. It should be noted, however, that direct observation may be the basis for which rating-based forms are completed. We did not ask respondents to comment on the data they used to fill out their rating-based forms.
Certain tools may be used very infrequently because they are labor or resource intensive. We confirmed this suspicion by asking a subset of the APDIM Survey Task Force membership (program directors and associate program directors) to list tools in increasing order of difficulty. Although not a true validation, the task force rated videos, computer simulations, and standardized patients/OSCEs to be the most burdensome tools for evaluating residents. Not surprisingly, the results from our survey showed these tools to be very infrequently used by programs nationwide.
In terms of compliance with ACGME recommendations, the data reveal that half of all programs (53%) are using at least 1 of the “most desirable” methods to measure all 6 competencies. These data are encouraging as the ACGME only began assessing competency evaluation in 2002.
Not surprisingly, very few programs were able to employ all of the “most desirable” tools to evaluate each competency comprehensively (1.5 to 14%). Note that competencies with the fewest number of “most desirable” tools (e.g., 3 tools) were easier to comprehensively assess. Future studies should ascertain if using 1 of the “most desirable” tools encourages using other “most desirable” tools, perhaps creating a “change threshold” for a particular competency.
The ACGME does not expect programs to evaluate every domain of every competency. Instead, it allows programs to decide which domains are most important to their locality and which tools are most feasible to use.1
Although programs are still a long way from comprehensively evaluating each competency, our results do show many programs to be using multiple tools for competency assessment.
Our bivariate analysis finds that programs using the “most desirable” methods of evaluation for all 6 competencies have more full-time equivalent (FTE) support staff per resident than programs using the “most desirable” method for evaluating fewer than 6 competencies. Having more teaching faculty per resident approached significance.
It is widely recognized that programs and program staff have struggled to devote the added time and effort needed to effectively teach and evaluate the competencies since their required implementation began in 2002.15
Our data demonstrates that programs evaluating all 6 competencies with a “most desirable” method are utilizing 1 FTE per 10 residents and 2.8 teaching faculty per resident. However, programs that were not able to evaluate all 6 competencies using a “most desirable” method employed 1 FTE per 14.3 residents and 2.0 teaching faculty per resident. It is interesting to note that the experience of the program director or the time he/she spends on administrative activities was not related to the number of competencies being evaluated with a “most desirable” method.
Several strengths and limitations in this study should be acknowledged. This is the first nationwide analysis describing the current state of resident evaluation processes for graduate medical education in Internal Medicine in the United States. We report a reasonably high response rate and provide prevalence data on the use of tools to measure trainee competence. We caution the reader that our data represent process measures, not outcomes. Our study shows how trainee competence is assessed, not whether trainee competence has occurred.
Other limitations include the exclusion of 4 tools in the ACGME Toolbox of Assessment Methods from our survey (record review, checklist, oral exams, and procedures/case logs). Such omissions limit our ability to comment on compliance with ACGME “most desirable” methods. However, we did allow programs to indicate “other” methods of competency assessment and found that these tools were in fact, infrequently used (only mentioned in 11% of the few comments made), thereby maintaining the validity of our results.
We cannot infer why certain programs are or are not compliant with ACGME recommendations. We cannot comment on social desirability bias and cannot verify the accuracy of responses provided to us by survey participants. Finally, and importantly, we cannot comment on the competence with which programs are using each tool. This analysis will be important for future studies on evaluation methods.
Competency evaluation in graduate medical education is still in its infancy. With the help of the ACGME, programs have started the complex task of assessment with a strong foothold. However, we are far away from a comprehensive evaluation of trainee competence and are not yet adept at using all the tools the ACGME suggests we use for evaluating our trainees. Such evaluation is imperative if medical education is to join the quality movement that aims to provide high-quality care through high quality physicians.16