|Home | About | Journals | Submit | Contact Us | Français|
Faculty assessment of students’ professionalism is often based upon sporadic exposure to students. Peers are in a unique position to provide valid judgments of these behaviors.
(1) To learn if peer assessments of professional conduct correlate with traditional performance measures; (2) to determine if peer assessments of professionalism influence the designation of honors, and (3) to explore student and faculty opinions regarding peer assessment.
Internal Medicine Clerkship at Southern Illinois University.
Since 2001 anonymous student peer assessments of professionalism have been used in assigning clerkship grades.
Peer assessments of professionalism had weak, though significant, correlations with faculty ratings (r=0.29), performance on the NBME subject test (r=0.28), and performance on a cumulative performance assessment (r=0.30), and did not change the total number of honors awarded. A majority of students (71%) felt comfortable evaluating their peers, and 77% would keep the peer evaluation procedure in place. A majority of faculty (83%) indicated that peer assessments added valuable information.
Peer assessments of professional conduct have little correlation with other performance measures, are more likely to have a positive influence on final clerkship grades, and have little impact on awarding honors.
Assessment of students’ professionalism is difficult to do well.1,2 Peers, who interact with each other frequently and observe behaviors to which faculty are not privy, are in a unique position to provide valid judgments of behavior.1–5 Peer assessments allow teachers to benefit from additional perspectives and have been recommended as a reliable method of professional assessment.5 Surprisingly, there are few successful peer assessment systems in place in US medical schools, and it is rare that peer assessments count toward summative grades. Many of the extant programs request learners to report on a limited number of peers (either randomly selected or self selected).6 Trainees often dislike and resist peer evaluations, stating that they hinder team relationships, promote competitiveness, and may cause harm by leading to an unfavorable grade.7–9 Students generally agree that a peer assessment system should: (1) be 100% anonymous, (2) provide immediate feedback, (3) focus on both unprofessional and professional behaviors, and (4) be used formatively to reward exemplary behavior and address repetitive professional lapses.3
There is limited published information on the use of peer assessments as part of a formal, summative evaluation in medical schools. At Southern Illinois University School of Medicine, use of peer assessments of professional behavior in determining final grades in the internal medicine clerkship has been in place since 2001.
The objectives of this study are to: (1) learn if peer assessments of professional conduct correlate with other performance measures, (2) determine if peer assessments of professionalism influence the assignment of clerkship honors, and (3) explore student and faculty opinions regarding peer assessment.
Southern Illinois University School of Medicine is a community-based medical school with 72 students per class. Students participate in clinical clerkships during their third year, after participating for 2 years in a small-group, problem-based learning (PBL) curriculum. At the end of each PBL group, students provide verbal formative self and peer evaluations.10 In the medicine clerkship (http://www.siumed.edu/medicine/clerkship/index.htm) students interact with their peers, residents, and faculty on specialty and general medicine services, as well as in groups during didactic conferences, standardized patient settings, and reports.
Student performance in all core clerkships is evaluated in three categories: (1) Clinical Performance, (2) Knowledge and Clinical Reasoning, and (3) Noncognitive Behaviors. The category of Noncognitive Behaviors addresses attributes associated with professional conduct, such as self-motivation, independent learning, interpersonal relationships, dependability, and integrity.
A new evaluation system for the internal medicine clerkship was implemented beginning with the Class of 2001, in response to student feedback that the evaluation process should be more systematic and objective. A formula prioritizing several performance areas was developed by the clerkship director, then modified and approved by the medicine faculty after discussion. In the new system, each area—skills, knowledge, and noncognitive behaviors—contributes equally to the final clerkship grade. For each category, specific performance activities are assigned weights that determine the final grade (Table 1). Student peer assessment of professional attributes was formally included as one of the factors weighed in determining the final evaluation in the category of Noncognitive Behaviors. Peer evaluations account for 20 percent of the grade in the Noncognitive Behaviors category and 7 percent of the overall clerkship grade.
During orientation, the clerkship director explains the evaluation process, and students are told they will anonymously assess the professional behaviors of their peers at the conclusion of the clerkship. They are instructed to confine their assessments to noncognitive domains and advised to complete an evaluation of a peer only if they feel they have had sufficient interaction during the clerkship to form an accurate assessment. At the end of the clerkship, the clerkship administrator, whom students view as friendly and supportive, repeats these instructions and gives each student a folder containing separate evaluation forms for each peer in the clerkship. Students do not complete a self assessment.
The peer evaluation instrument (Appendix) has the identical items, descriptors, and 5-point rating scale as those on the form faculty have used for more than 20 years. The student form was not pilot tested.
To accomplish the study objectives, we conducted an analysis of existing performance records for the classes of 2001–2005 in which peer ratings of 349 internal medicine clerkship students were compared with clinical evaluations completed by faculty, NBME subject examinations scores, performance on a senior post-clerkship performance examination, and election at any time to the Alpha Omega Alpha Honor Society. In addition, senior students in the class of 2004 and the Internal Medicine Faculty voluntarily and anonymously completed author-generated opinion surveys about the peer assessment process. The student surveys were distributed and collected by a sophomore student volunteer during a single large group meeting; no faculty were present for the session. Faculty surveys were distributed by mail and collected by a research assistant.
This project received approval from the Southern Illinois University School of Medicine Springfield Committee for Research Involving Human Subjects.
We analyzed data collected from 349 students (100% of students) over 5 years. Statistical analysis consisted of descriptive measures, correlations, paired t-tests, and analysis of variance, with statistical significance set at the five percent level. The reliability of the three-item peer evaluation tool was measured with Cronbach’s alpha and found to be 0.89, indicating strong internal consistency on a per student basis. An average of 18 students was in each clerkship, and we collected a mean of 12 anonymous peer evaluations per student (range 6–19).
Student assessments had mean peer ratings (4.18) that were lower than mean faculty ratings (4.27, p<0.001). Peer ratings of professional conduct had a weak correlation with faculty ratings (r=0.29, p<0.001) (Figure 1). A weak, but statistically significant (p<0.001) correlation was also noted for the relationship between peer ratings of professional behaviors and performance on the NBME subject test (r=0.28), faculty ratings of clinical skills (r=0.28), election to AOA (r=0.24), and performance on a senior post-clerkship competency exam (r=0.30).
Students wrote over 1,800 comments on their peer evaluation forms. Using an iterative process, two authors (RK and DR) reviewed and categorized the comments into strengths (1,545) and weaknesses (316), subcategorized them into themes, and resolved any disagreement by consensus (Table 2). Themes addressing team communication, interpersonal interactions, and dedication to patients or educational events were most often spontaneously reported. There were often striking differences between faculty comments and peer comments about the same student that clearly illustrated a student’s unique perspective (Table 3).
The results of the opinion surveys (Table 4) indicated that students and faculty felt the peer assessments added valuable information to evaluations, were equally or more accurate than faculty assessments, should continue as part of the evaluation process, and should remain anonymous.
Students and faculty wrote 181 comments that were categorized in a similar manner described for the evaluation tool (Table 5). The most common theme reflected concern about bias-driven remarks—either that friends would favor one another or that personality conflicts would interfere with honest reporting.
Overall, with peer assessments included, 32 students (9%) received a better grade for the clerkship—an ‘exceeds expectations’ rather than ‘satisfactory’ rating in the category of Noncognitive Behaviors. Peer assessments accounted for honors for two students and the denial of honors for two students; however, the total number of honors awarded during the 5-year study period was unchanged. No student failed or remediated the clerkship because of peer assessments, but those with low ratings or troublesome comments received verbal feedback and counseling. Comparing the 5 years before to 5 years after the implementation of the new grading system, there was minimal impact on students who received ‘exceeds expectations’ for Noncognitive Behaviors (55% before, 53% after) or designation of honors (22% before, 19% after).
The weak but significant correlation between peer and faculty evaluations of students’ professional conduct supports the notion that students have a unique viewpoint that is valuable in assessment of noncognitive behaviors such as professionalism. Indeed, students are tougher graders than faculty, despite 41% saying they provided inflated ratings of their peers. Likewise, the lack of a strong relationship between peer assessments and traditional performance measures reinforces the belief that students contribute additional information, and can discriminate between their peers’ academic performance and noncognitive attributes. These findings, along with the discordant faculty and student comments, are factors that speak to the value of peer assessments.
To our knowledge, there is only one other published report describing a core clerkship evaluation system where peer assessment of professional conduct is used for summative evaluation.2 Similar to our findings the authors found students could reliably evaluate their peers with a high degree of internal consistency. In contrast, they found that peer evaluations were an important reinforcer of faculty evaluations and were uncertain if peer evaluations offered a source of unique information.
Peer assessment of professional behavior has been generally regarded as an appropriate formative, rather than summative, assessment method for medical students because it permits timely corrective feedback of problem behaviors and encouragement of a student’s strengths.11,12 In contrast, there are questions about the reliability of summative peer assessments and their usefulness in high-stakes settings.13 Our experience indicates that the majority of faculty and students at our institution support student peer assessments as a meaningful contribution to the final clerkship grade. We believe that the anonymity of the evaluations favors acceptance, along with the fact that the weighing of the peer evaluations is modest. Though not specifically explored on the surveys, it is possible the environment of our institution encourages acceptance of the process since students regularly complete global formative peer evaluations in the first 2 years of the curriculum. It may be that familiarity with the process, even when the stakes are higher, increases their comfort.
Some students and faculty expressed reasonable concerns about the system, and we share their worry about bias, unfair comments, and lack of accountability. The survey did not inquire if students rated their peers inappropriately low. These considerations, along with the written comments regarding negative bias, raise the possibility that deflated ratings were provided, although this is impossible to determine. The degree to which dishonesty and self interest influence peer ratings, as well as ways to safeguard students from unfounded comments and ratings, are areas for future investigation.
Despite these concerns, we believe that by recognizing and valuing student opinions in a process that counts, we demonstrated trust of their judgments and respect for them as professional colleagues. In addition, we feel the experience of being involved in a peer review process in school prepares them for similar obligations as residents and practicing physicians.
In summary, we feel the positive benefits of a summative peer assessment process are important, and the balance of weights we have designed is reasonable.
There are several limitations to this study. First, it was conducted at one medical school, and our findings may not generalize to other institutions. This may be especially true given the relatively small class size and cohesiveness of our school. Second, the first part of the study was an analysis of existing data, and third, only one class out of the five was surveyed. We found the survey data useful, but a larger sample size would have lent robustness to the results. In addition, talking to students in focus groups may have generated more insight into student opinion and should be considered as a supplemental procedure in the future. Last, though 11 evaluations have been suggested for reliable assessment of professional behavior,14 we did not manage students with fewer than 11 differently than those with more. We re-examined the 97 (28%) students with fewer than 11 evaluations and found that peer evaluations affected the grades of seven: five (1%) positively and two (1%) negatively, with no change in the granting of honors.
We found that there was little correlation of peer assessments of professional conduct with other performance measures, and that students and faculty accepted peer assessments of professionalism as a meaningful contribution to their grade. Peer assessments are more likely to have a positive rather than a negative influence on the final clerkship grade and have little impact on the designation of honors. We believe they provide valuable information in the assessment of professional attributes of students and should continue to be used.
We thank Kate Beasley for her valuable help with data entry and manuscript preparation. There was no internal or external funding support for this study.
Conflict of Interest None disclosed.
Student Peer Evaluation Rating Form