The evaluation of the categorization scheme showed that it has a good interrater reliability, feasibility, and responsiveness. The interrater reliability score indicates that the raters interpreted and applied the coding scheme consistently. According to the feasibility test, the time used for coding was reasonable. Finally, the responsiveness test indicated that changes in the level of reflection can be captured by this assessment method, and that students' levels of reflectivity increased over time during the internship course.
Measuring the level of reflection is difficult. Hence, the validation of the categorization scheme is a concern in this study. Did the scheme really measure reflection or was it the students' ability to express thoughts in written form that was assessed? Some students can probably be reflective in their work and have a reflective thinking process, but might not be able to formulate this in a short written essay. Some students may also have a resistance towards writing this kind of assignment regardless of their levels of reflection. This might be reflected in the poor response rate during the first semester when the assignment was not a part of the curricula. Only 20.5% (17 of 83 students) chose to write the essays at both the start and end of the course. This resistance could be a result of a long schooling period during which priority had not been given to personal reflection of knowledge, but rather to a repetition of facts.
Since the scheme was used as a research tool in this study, the feasibility test does not include the time spent giving feedback to students about their reflective skills and discussing their progress. When using essays as a part of the reflective curricula in routine education, the assessment should be accompanied by discussions with tutors and peers to make full use of the learning opportunity. This will add to the time taken by the grader on each paper.
Schon was among the first to describe and propose reflective thinking as the basis for learning in the professional setting. His theory is well-known and has been used in several different educational settings.
18,25,26 By choosing Schön's theory about the reflective practitioner as a theoretical framework, and combining it with Mezirow's way of describing different levels of reflection, the scheme used in this study was developed on a solid theoretical basis. The foundation for our coding scheme was a test already developed by Kember et al. They showed that their coding scheme, based on Mezirow's theory, is valuable when assessing students' levels of reflection.
26 Several studies have used reflective journals and showed that, by using predefined levels, different raters can reach the same result.
25,26,28 However, Wong et al experienced difficulties with reliability when grading using several different levels instead of only 2,
25 although these were resolved by Kember et al in later work.
26 Still, validity is an important concern. By using Kember et al's existing coding scheme and illustrating the different levels of reflection with examples from the written pilot essays, its validity for the pharmacy practice context has been strengthened. The examples we used were of an emotional character, but other examples could have been used equally well, for instance, by focusing on pharmacotherapeutic dilemmas. The main purpose of the examples was to better guide the raters.
Our study indicates that the modified scheme is a valid way to measure reflection, but further research is warranted to confirm this. This includes determining whether the rating generated by applying the scheme is associated with related skills such as critical thinking, and whether it is related to factors such as working experience and pharmacy working environment. Since there is a prevailing assumption about the relationship between reflective skills and professional outcomes in pharmacy practice (for instance, patient counseling skills), the scheme's predictive validity would also be important to determine. Further, external validity should be strengthened by repeated research at different schools of pharmacy in different counties.
The change in the interrater reliability score from 0.59 for the essays from fall 2005 to 0.65 for the spring 2006 essays indicates, but does not prove, that there is a training effect and that the accuracy of the rating increases with increased experience. However, both raters reported an improved confidence in the second categorization and we therefore strongly recommend pilot testing in order to have material to analyze and discuss before doing the actual rating. The interrater reliability was classified as good, and considering the difficulties in measuring a subjective outcome as the level of reflection, the result was very good.
26
The feasibility test showed that using this scheme would be a reasonably fast method of assessing the level of reflection in written essays, which is very important if this method is to be implemented in teaching settings. As mentioned earlier, the time measured was only for the grading procedure and is prone to increase if used in formative assessments.
There is also a difference in the time used by the 2 raters. The raters used slightly different processes' for grading, and according to the raters themselves, differences in teaching experience and reading speed contributed to the difference in time required. The previous experience of rating large amounts of essays and other examinations might have made the decision process faster for AW. However, these hypotheses have to be formally tested, and there might also be other factors that explain this difference.
In order to accurately determine reasons behind the differences in time used, more raters have to be used. Nevertheless, our study indicates that although the rating procedure and time used is prone to vary between raters, the interrater reliability is good.
The students' reflective skills increased during the internship. This is consistent with our hypothesis for the responsiveness test. Reflection would be expected to increase when students, for the first time in their education, apply theoretical knowledge in real situations, write reflective diaries, and have reflective discussions with their tutor.
17,33
The increase in level of reflection is probably not only due to the reflective curriculum during the internship course, but also due to several other factors. There might, for example, be a training effect among the students when completing their second essay, which possibly could contribute to the increased level of reflection. In this study the training effect was minimized by not providing any feedback to the students between the start and end essays. To conclusively discriminate between the instrument's responsiveness to true change in reflectivity and mere instrument sensitization, a controlled study would be needed.
The fact that none of the students reached the highest level of reflection and had a fairly low mean level of reflection is consistent with other studies.
18,22,25 We found that the highest level of reflection (level 6) is difficult to detect with written essays, since the respondents have to prove that the reflection has altered their way of approaching different problems and that this understanding has been internalized into their professional understanding. This is also discussed by Kember.
26 However, in the pilot study, one student was writing in such a manner that we could categorize it at the highest level.
Reflective essays and journals have also been used by earlier studies.
14,16,22,25,27 We used a modified version of Kember's categorization scheme for assessing pharmacy interns' levels of reflection. This scheme is exemplified by situations from pharmacy practice and has 6 levels that build upon each other. The essay assignment used is also straightforward and easily integrated into the internship course. The topic of the assignment – communication and patient counseling – was intentionally selected because pharmacy students and pharmacists always have opinions about this subject. It is therefore usable both as a baseline measure, before any educational interventions, and as a final outcome. By using a scheme with well-defined levels of reflection, assessing an intangible skill such as reflection is possible. The progress of learning professionalism is often hard to assess and this scheme for categorization of written essays can be a valuable complement in helping both the student and the tutor in assessing and developing reflective skills. It can, however, not replace the professional eye of a trained pharmacy tutor and have to be carefully implemented. Although it can be used both as a formative and summative assessment, reflective writing may be affected by the assessment situation.
10 In composing essays that are to be assessed, students may conform their writing to what they think is expected of them rather than giving free scope to their own, independent thinking.
10 It is also important to ensure student buy-in.
10 Efforts have to be made to explain the purpose of reflective writing and its function as a learning experience.