Introduction of the formative course format resulted in both improved student performance and increased positive affect. In the original version of the course, student performance was already at a high level acceptable to instructors (Kitchen et al., 2003
). Moreover, this level had remained constant over several semesters, suggesting that the pedagogy used was consistent and well established. Thus, it was both exciting and significant to discover that further and large gains in performance could be realized by changing the system of examination and grading in the course. This improvement was complemented by the presence of positive student attitudes, especially during the third year. These outcomes developed over the 3 yr of this study as various aspects of the course were tailored to meet student needs as informed by evaluation data. During this process, a number of key components were identified that had noticeable impact on favorable student performance and affect.
Communication and trust between the instructor and students were central facets of the course. Although the instructor had to trust that students would take the weekly assessments seriously, students had to trust that there would be alignment between the weekly assessments and the final exam. Frequent comments by students suggested that at first, they were uncertain they could achieve proficiency in data interpretation. The instructor had to provide encouragement and reasons for students to remain positive until they could see the benefits of their efforts. For instance, he informed them that students in past semesters had succeeded at data interpretation and that the process would become easier over time. The instructor reminded students that helping each other would not be penalized; rating was criterion based, and students were not competing with peers for a grade. Instructors attempting a similar course layout must recognize that trust can be a troubling issue for students at first.
Another important component of the course was control. We suspect from conversations with students that one source of frustration for students is the feeling they do not have control over a situation. Because students in this course were acquiring a new skill, they did not yet have command of data interpretation. We mollified student apprehension about the new technique by delaying judgment of performance in the course until the final exam, allowing students to achieve proficiency at their own pace. That is, a student who achieved “A”-level proficiency the day before the final exam would receive the same final grade as a student who worked at “A”-level from the first day of class. This format allowed time for students to experiment with different methods until they gained control over data interpretation and discovered what would lead to success. Weekly formative assessments also saved students the worry and anxiety often associated with midterm exams.
Some students may have been nervous at first that their grade was determined only by performance on the final exam. However, we found that students generally performed the same or better on the final exam than on the weekly assessments (for example, in year 3, 18% scored the same, and 57% scored higher on the final exam). In addition, students scored their own weekly assessments, so they had control over how their work was read and interpreted. Students were also offered the chance to include evidence of their performance on weekly assessments during the course to influence their final grade. These implementations lent credibility to the assessments and held students accountable for their work. At the same time, students felt empowered and reassured by their role. These elements help explain some aspects of positive affect and the perception that grading procedures were fair.
Because the course was designed with weekly assessment iterations, students had multiple opportunities to attempt success. The total number of items from the weekly assessments was the same as the total number students were given on midterm exams in the original version of the course. Spacing questions throughout the semester allowed multiple attempts at success and multiple opportunities for students to receive pieces of feedback to inform improvement. If students did not perform well on a weekly assessment, they had the reassurance that they would have another chance to try new methods to attempt success in the upcoming week. The format of the assessments did not vary from week to week; instead, they represented a consistent voice reiterating the importance of data interpretation. This uniformity helped students recognize their improvement during the course from week to week, which provided them with the sense that they were succeeding.
In year 2, efforts had been made to elevate student motivation by readjusting their self-generated scores with one given by the TA. The thinking was that performance might be improved by informing students more realistically of their status relative to a potential grade. This attempt appeared successful since improvements in student performance and self-scoring accuracy were observed (C and ). However, student comments in class and responses on the affective survey (see items 35 and 38 in ) suggested that we were misguided in this approach; it appeared that student attention was focused more on assigning the correct grade than on the more productive elements of feedback. Moreover, some of the survey responses may have also reflected frustrations or disagreements with the TA's perception of their answers to assessment items. Thus, we predicted that de-emphasizing the letter grades combined with increased attention to formative feedback would improve student attitudes while retaining the enhanced performance. Consequently, we abandoned the readjustment of self-assigned scores and focused greater attention on discussion and feedback at the end of each weekly assessment during year 3.
We were gratified to discover that not only were attitudes better during year 3, but performance was increased by an additional increment compared with year 2, and self-scoring accuracy was retained ( and , and ). Some of this success surely reflects instructor experience and comfort with the course structure. Nevertheless, much of the improvement in year 3 may also be attributed to the quality of feedback students received with each Friday assessment (see item 40 in ). This feedback was immediate and multidimensional; it included personal, peer, and instructor components. On completion of the assessment, students were first invited to spend 5 to 10 min sharing their written responses with one or more nearby partners in the classroom. The instructor then projected to the entire class a set of expert responses to the items with a brief explanation of each. This was followed by an additional 10 min of interaction among students in small groups as they compared their responses with those of the expert. This discussion was animated and noisy with frequent requests for input from the instructor or TA. Students were encouraged to help each other discover ways in which their responses could have been improved and how they might prepare better for the next assessment. Students were then offered the opportunity to remain in the classroom for an additional hour of discussion of the problems. This additional hour consisted of both small group and whole class conversations. Some of these discussions included sharing of individual responses with the entire class and generating a discussion of the merits of those responses. The instructor periodically provided samples of responses at different levels of quality, and students were given practice in evaluating and ranking those responses. The instructor then modeled his evaluation of the samples. Throughout the various exercises used to provide feedback, the focus was always twofold, helping students obtain a realistic sense of their own performance and teaching them to make decisions regarding how they might improve.
Weekly observations of the class by the instructor supported the idea that the extra hour of feedback offered after assessments was key. Even though attendance was optional and the session was held late in the afternoon on Fridays, most of the students remained and actively participated. The level of engagement contrasted that observed in previous years when the instructor attempted to supplement feedback provided immediately after assessments (commonly confined to ~10 min) by continuing the discussion on the following Monday. Interestingly, students appeared unresponsive and disinterested in further discussion of assessments from the previous week. This observation reinforced the thought that both the adequacy as well as the timeliness of the feedback were essential. Because the extra feedback hour during year 3 occurred immediately after an assessment, students were highly engaged with the material and receptive to learning. We conclude that one of the most important modifications that can be made to any classroom is to engineer situations such as this where students have an opportunity to try out their learning followed by clear and timely feedback.
Notwithstanding all the changes implemented throughout the three years, the reported time students spent outside of class remained constant. Students spent or believed they spent about 2 h less each week than that spent in the original version of the course. To the extent that this time difference was real, an increase in student efficiency is one possible explanation. Alternatively, the reported time difference could be perceptual. Perhaps the time that students reported reflected layout of the course or was affected by direction the instructor gave about how much time students should be spending outside of class. Because performance on data analysis tasks improved despite this decreased time spent, we believe this change to be a positive one.
Faculty attempting to implement this teaching method may be concerned about the potential for grade inflation. In fact, if a criterion-based grading system is used and student performance improves, one would expect an elevation in grades. Nevertheless, this system did not appear to inflate grades inappropriately. For example, in the third year, when performance was the highest, the course grade point average was 3.19 ± 0.7 (SD). Moreover, the process of empowering students to have input into their course grade had only a modest effect on class grades. In year 3, only 22% of the class successfully justified a higher grade than what was achieved on the final exam. The average grade increment among those students was 0.35 ± 0.07 (SD) grade point units.
In summary, this study has demonstrated three important lessons. First, it has reinforced and corroborated insights promoted by educational theorists and researchers: frequent, nonthreatening formative assessment is a valuable tool for instructors and students (Butler and Nissan, 1986
; Black and Wiliam, 1998
; Huba and Freed, 1999
; Klionsky, 2001
). Second, it has shown that these three aspects of assessment can be achieved in the context of developing higher-order thinking skills in the science classroom. Finally, it has emphasized that implementation of pedagogical reform in a course requires two critical elements analogous to the process described here for student learning: formative feedback and iteration. As described above, the strong student performance and positive attitudes did not appear instantaneously upon adoption of the formative format. Without careful attention to performance and affective survey results, the instructors would not have made those decisions that led to success. Although occasional misinterpretation of the data from these evaluations can result in detours such as occurred for us during year 2, the process is self-correcting if applied consistently. Hence, attention to the data gathered during year 2 generated the ultimate success observed in year 3. Regardless of whether the specific format described in this report seems applicable to other courses or worthy of consideration, the process of course improvement illustrated is imperative. The most important message we could communicate is that all instructors should be actively engaged in systematic evaluation and responsive decision-making in their courses.