We have developed a method of assessing student reflections using standardized video cases and a scoring rubric, applied it to 270 fourth and fifth year undergraduate medical students, and demonstrated that the resulting reflection scores have acceptable psychometric properties including the ability to discriminate, inter- and intra-rater reliability, and case-specificity.
Replacing situations unique to individual students with standardized video-cases provided a common base for assessment without limiting variance between reflection scores. This variance can be attributed to two factors. First, students have unique frames of reference influenced by their individual prior experiences, knowledge, and beliefs [44
], which lead them to reflect on different aspects of experience, pose different searching questions, and identify different learning goals. Second, the scoring items of StARS® identify the process of reflection (eg. the ability to ask searching questions or to draw conclusions) and this process varies independently of the content of reflection which is related to the triggering situation [41
The inter-rater reliability of skills lab physicians, who had been trained for only 30 minutes, was sufficient. This finding reflects favourably on the use of guiding questions to structure reflections and the quality of the scoring rubrics. Each rater took about three hours to score 40 student reflections, proving StARS® is a practical instrument to evaluate student reflections in order to provide feedback.
Feedback about reflection is becoming increasingly important as the idea of reflection as a strictly individual internal process is changing into a notion of a thinking process that needs to be complemented with external feedback. This increased focus on external information is grounded in concerns about individuals lacking accurate introspection skills to fuel reflections and recognition of a need to verify one’s reflecting thoughts and frame of reference against a broader perspective [45
]. Discussing experiences and the reflective thoughts that accompany them is key to bringing an internal process and external information together. Multiple formats have been proposed such as critical friends, formative feedback from supervisors and peer feedback [46
]. However, interacting effectively about reflections, requires individuals to learn to verbalize their reflective thoughts. Our proposed method of assessment through facilitated reflection may be beneficial for this learning process as it structures reflections by means of structuring questions and provides feedback on essential aspects of the process of reflection as StARS® items are scored.
The generalizability study identified students, cases, and the interaction between them to be the main sources of variance in reflection scores. The variability between students is evidence of systematic individual differences in the quality of reflection and is not to be seen as error [49
]. Variance between cases (case specificity), however, was an important source of error. The D study showed that increasing the number of cases had a much greater effect on the G coefficient than increasing the number of ratings. The content of cases and reflections that ensue from them have a complicated relation. According to Schön [11
] a complex, challenging context best stimulates reflection. We tried to match video-cases to students’ expected level of competence but it is likely individual students found different levels of challenge in the same cases and were therefore stimulated differently by them. As well as case-related effects, Kreiter and Bergus [50
] recommended considering occasional influences like momentary insights and confusions as possible confounders. Despite those considerations, three to four cases (depending on the number of ratings) were enough to obtain the G coefficient of 0.80 needed for high stakes decisions in fourth year students, though fifth year students needed over six cases [43
]. This result suggests the usage of this method spread over time during a course rather than on one day high stakes exams as students need approximately 1 hour to view a case and to reflect upon.
Whilst the standardized context of video-cases is useful for training and assessment purposes, it also introduces a limitation. The ultimate aim of reflection is to learn from experiences so future actions can be more purposeful and deliberate [16
]. In real life, students choose which experiences to reflect on, related to their individual development as physicians-to-be and life-long learners. Fueled, as they are, by less personal and meaningful experiences, reflections based on standardized video-cases might have a lesser impact on individual learning. That disadvantage may, however, be offset by the advantages of giving feedback on reflection that is informed by detailed knowledge of the triggering situation.
It could be argued that using a 4-point scale in StARS® (0,1,3,5) limits the diversity of reflection scores and hence discrimination between students. Our findings do not, however, support that claim as scores ranged between 0–30 with standard deviations above 4.0 in each year and for each case. Reflection scores were calculated as the sum of the scores on the 6 items in the rubric. That had the benefit of showing differences in students’ overall ability to reflect but could also hide important differences between students with similar total scores. Totally different patterns of item scores, resulting from students’ diverse reflection strategies could result in similar aggregate scores .
It could be questioned whether the 6-item structure of StARS® adequately represents the process of reflection. In fact, we reviewed the literature very carefully to search for items that were common to the various widely-used models/theories of reflection to develop the scoring rubric [10
]. Use of those common items to construct StARS® is an important factor contributing to its validity.
Medical students have a constant stream of encounters with colleagues, supervisors, patients, their families, and other health care workers. This continuous series of interrelated events, and the reflections they trigger are wide open to further research. The aim of the present study was to develop a method of meeting this complex educational challenge under well-defined, standardized lab conditions. Comparison with the learners’ ability to reflect in more complex and authentic situations in real life is the next challenge. Further research, however will have to identify how to standardize the stimulus for these authentic reflections and how to make it possible for an assessing third party to observe them in whole populations of students. Furthermore, future research could focus on the relation between acquired reflection scores and academic or medical performance since empirical evidence about the effects of reflection on practice remain scarce [21