Reliable and valid written tests of higher cognitive function are difficult to produce, particularly for the assessment of clinical problem solving. Modified Essay Questions (MEQs) are often used to assess these higher order abilities in preference to other forms of assessment, including multiple-choice questions (MCQs). MEQs often form a vital component of end-of-course assessments in higher education. It is not clear how effectively these questions assess higher order cognitive skills. This study was designed to assess the effectiveness of the MEQ to measure higher-order cognitive skills in an undergraduate institution.
An analysis of multiple-choice questions and modified essay questions (MEQs) used for summative assessment in a clinical undergraduate curriculum was undertaken. A total of 50 MCQs and 139 stages of MEQs were examined, which came from three exams run over two years. The effectiveness of the questions was determined by two assessors and was defined by the questions ability to measure higher cognitive skills, as determined by a modification of Bloom's taxonomy, and its quality as determined by the presence of item writing flaws.
Over 50% of all of the MEQs tested factual recall. This was similar to the percentage of MCQs testing factual recall. The modified essay question failed in its role of consistently assessing higher cognitive skills whereas the MCQ frequently tested more than mere recall of knowledge.
Construction of MEQs, which will assess higher order cognitive skills cannot be assumed to be a simple task. Well-constructed MCQs should be considered a satisfactory replacement for MEQs if the MEQs cannot be designed to adequately test higher order skills. Such MCQs are capable of withstanding the intellectual and statistical scrutiny imposed by a high stakes exit examination.
The aim of this study was to evaluate the efficacy of a new psychiatry clerkship curriculum which was designed to improve the knowledge and skills of medical students of Tehran University of Medical Sciences (TUMS), Iran.
This quasi-experimental study was conducted in two consecutive semesters from February 2009 to January 2010. In total, 167 medical students participated in the study. In the first semester, as the control group, the clerks’ training was based on the traditional curriculum. In the next semester, we constructed and applied a new curriculum based on the SPICES model (student-centered, problem-based, integrated, community-based, elective and systematic).At the end of the clerkship, the students were given two exams: Multiple Choice Questions (MCQ) to assess their knowledge, and Objective Structured Clinical Examination (OSCE) to assess their skills. Baseline data and test performance for each student were analyzed.
Compared to the control group, students in the intervention group showed significantly higher OSCE scores (P= 0.01). With respect to MCQ score, no significant difference was found between the two groups.
The results suggest that the revised curriculum is more effective than the traditional one in improving the required clinical skills in medical students during their psychiatry clerkship.
Psychiatry; Clerkship; Education; Medical students; Curriculum
Clinical reasoning is a core competence of doctors. Therefore, the assessment of clinical reasoning of undergraduate students is an important part of medical education. Three medical universities in the Netherlands wish to develop a shared question database in order to assess clinical reasoning of undergraduate students in Computer-Based Assessments (CBA). To determine suitable question types for this purpose a literature study was carried out. Search of ERIC and PubMed and subsequent cross referencing yielded 30 articles which met the inclusion criteria of a focus on question types suitable to assess clinical reasoning of medical students and providing recommendations for their use. Script Concordance Tests, Extended Matching Questions, Comprehensive Integrative Puzzles, Modified Essay Questions/Short Answer Questions, Long Menu Questions, Multiple Choice Questions, Multiple True/False Questions and Virtual Patients meet the above-mentioned criteria, but for different reasons not all types can be used easily in CBA. A combination of Comprehensive Integrative Puzzles and Extended Matching Questions seems to assess most aspects of clinical reasoning and these question types can be adapted for use in CBA. Regardless of the question type chosen, patient vignettes should be used as a standard stimulus format to assess clinical reasoning. Further research is necessary to ensure that the combination of these question types produces valid assessments and reliable test results.
Computer-Based Assessment; Clinical reasoning; Medical undergraduate students
determine what standard paediatric medical students would set for
examining their peers and how that would compare with the university standard.
computer marked examination with questionnaire.
students during their final paediatric attachment.
students asked to derive 10, five branch negatively marked multiple
choice questions (MCQs) to a standard that would fail those without
sufficient knowledge. Each 10 were then assessed by another student as
to the degree of difficulty and the relevance to paediatrics. One year
later student peers sat a mock MCQ examination derived from a random 40 questions (unaware that the mock MCQs had been derived by peers).
MEASURES—Comparison of marks obtained in mock and
final MCQ examinations; student perception of the standard in the two
examinations assessed by questionnaire.
derived 439 questions, of which 83% were considered an appropriate
standard by a classmate. One year later 62students sat the mock
examination. Distribution of marks was better in the mock MCQ
examination than the final MCQ examination. Students considered the
mock questions to be a more appropriate standard (72%
v 31%) and the topics more relevant (88%
v 64%) to paediatric medical students.
Questions were of a similar clarity in both examinations (73%
in this study were able to derive an examination of a satisfactory
standard for their peers. Involvement of students in deriving
examination standards may give them a better appreciation of how
standards should be set and maintained.
This study was carried out to assess the relationship between thevarious assessment parameters, viz. continuous assessment (CA), multiple choice questions (MCQ), essay, practical, oral with the overall performance in the first professional examination in Physiology.
Materials and Methods:
The results of all 244 students that sat for the examination over 4 years were used. The CA, MCQ, essay, practical, oral and overall performance scores were obtained. All the scores were rounded up to 100% to give each parameter equal weighting.
Analysis showed that the average overall performance was 50.8 ± 5.3. The best average performance was in practical (55.5 ± 9.1), while the least was in MCQ (44.1 ± 7.8). In the study, 81.1% of students passed orals, 80.3% passed practical, 72.5% passed CA, 58.6% passed essay, 22.5% passed MCQ and 71.7% of students passed on the overall performance. All assessment parameters significantly correlated with overall performance. Continuous assessment had the best correlation (r = 0.801, P = 0.000), while oral had the least correlation (r = 0.277, P = 0.000) with overall performance. Essay was the best predictor of overall performance (β = 0.421, P = 000), followed by MCQ (β = 0.356, P = 000), while practical was the least predictor of performance (β = 0.162, P = 000).
We suggest that the department should uphold the principle of continuous assessment and more effort be made in the design of MCQ so that performance can improve.
Continuous assessment; essay; examination; MCQ; oral; practical
This paper is an attempt to produce a guide for improving the quality of Multiple Choice Questions (MCQs) used in undergraduate and postgraduate assessment. Multiple Choice Questions type is the most frequently used type of assessment worldwide. Well constructed, context rich MCQs have a high reliability per hour of testing. Avoidance of technical items flaws is essential to improve the validity evidence of MCQs. Technical item flaws are essentially of two types (i) related to testwiseness, (ii) related to irrelevant difficulty. A list of such flaws is presented together with discussion of each flaw and examples to facilitate learning of this paper and to make it learner friendly. This paper was designed to be interactive with self-assessment exercises followed by the key answer with explanations.
Pitfalls; assessment; student
To investigate the teaching of cognitive skills within a technical skills course, we carried out a blinded, randomized prospective study.
Twenty-one junior residents (postgraduate years 1– 3) from a single program at a surgical-skills training centre were randomized to 2 surgical skills courses teaching total knee arthroplasty. One course taught only technical skill and had more repetitions of the task (5 or 6). The other focused more on developing cognitive skills and had fewer task repetitions (3 or 4). All were tested with the Objective Structured Assessment of Technical Skill (OSATS) both before and after the course, as well as a pre- and postcourse error-detection exam and a postcourse exam with multiple-choice questions (MCQs) to test their cognitive skills.
Both groups' technical skills as assessed by OSATS were equivalent, both pre- and postcourse. Taking their courses improved the technical skills of both groups (OSATS, p < 0.01) over their pre-course scores. Both groups demonstrated equivalent levels of knowledge on the MCQ exam, but the cognitive group scored better on the error-detection test (p = 0.02).
Cognitive skills training enhances the ability to correctly execute a surgical skill. Furthermore, specific training and practice are required to develop procedural knowledge into appropriate cognitive skills. Surgeons need to be trained to judge the correctness of their actions.
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong.
Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic.
The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating.
The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method.
The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method.
The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74.
There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
People forget much of what they learn, therefore students could benefit from learning strategies that yield long-lasting knowledge. Yet surprisingly, little is known about how longterm retention is most efficiently mastered. We studied the value of teacher made in class tests as learning aids and compared two types of teacher-made tests (multiple choice and short-answer tests) with a no test (control) to determine their value as aids to retention learning.
The study was conducted on two separate batches of medical undergraduate students. This study compared two types of tests [multiple choice questions (MCQs) and short answer questions (SAQs)] with a no test (control) group. The investigation involved initial testing at the end of the lecture (post instruction), followed by an unannounced delayed retention test on the same material three weeks later. The unannounced delayed test comprising of MCQs and SAQs on the same material was given three weeks later to all the three groups.
In batch I, the MCQ group had a higher mean delayed retention score of 10.97, followed by the SAQ group (8.42) and the control group (6.71). Analysis of variance (ANOVA) test and least significance difference (LSD) post hoc test revealed statistically significant difference between the means of the three groups. Similar results were obtained for batch II
Classroom testing has a positive effect on retention learning; both short-answer and multiple-choice tests being more effective than no test in promoting delayed retention learning, however, multiple-choice tests are better.
Initial testing; delayed retention tests; retention learning
Multiple-choice question (MCQ) examinations are increasingly used as the assessment method of theoretical knowledge in large class-size modules in many life science degrees. MCQ-tests can be used to objectively measure factual knowledge, ability and high-level learning outcomes, but may also introduce gender bias in performance dependent on topic, instruction, scoring and difficulty. The ‘Single Answer’ (SA) test is often used in which students choose one correct answer, in which they are unable to demonstrate partial knowledge. Negatively marking eliminates the chance element of guessing but may be considered unfair. Elimination testing (ET) is an alternative form of MCQ, which discriminates between all levels of knowledge, while rewarding demonstration of partial knowledge. Comparisons of performance and gender bias in negatively marked SA and ET tests have not yet been performed in the life sciences. Our results show that life science students were significantly advantaged by answering the MCQ test in elimination format compared to single answer format under negative marking conditions by rewarding partial knowledge of topics. Importantly, we found no significant difference in performance between genders in either cohort for either MCQ test under negative marking conditions. Surveys showed that students generally preferred ET-style MCQ testing over SA-style testing. Students reported feeling more relaxed taking ET MCQ and more stressed when sitting SA tests, while disagreeing with being distracted by thinking about best tactics for scoring high. Students agreed ET testing improved their critical thinking skills. We conclude that appropriately-designed MCQ tests do not systematically discriminate between genders. We recommend careful consideration in choosing the type of MCQ test, and propose to apply negative scoring conditions to each test type to avoid the introduction of gender bias. The student experience could be improved through the incorporation of the elimination answering methods in MCQ tests via rewarding partial and full knowledge.
To assess the preferred methods for assessment among medical students at both preclinical and clinical stages of medical education and the possible correlates that promote these preferences.
Subjects and methods
All medical students from the third year onwards were surveyed. A self-administered anonymous questionnaire was designed to gather information on the preferred assessment method for course achievement. The preferred methods were also evaluated in relation to cognitive functions. Preference for specific exam format, in the form of multiple choices, short essay questions, or both, and the stated reasons for that preference, was also included in the questionnaire.
Out of 310 questionnaires distributed, 238 were returned. Written tests, projects, portfolios, and take home exams were the preferred modes for assessing students’ achievements in a course; oral tests including a viva voce were the least preferred type of assessment. Questions that tested the domains of ‘understanding’ and ‘application’ were the most preferred type while those entailing ‘analysis’ were the least preferred. Multiple choice question format was the most preferred type of question (68.7%) at both pre- and clinical stages.
Students’ assessments at the College of Medicine, King Faisal University, Saudi Arabia, do not use the full range of cognitive domains. The emphasis on higher domains for medical students’ assessment incorporating critical thinking should increase as the students’ progress through their medical courses.
medical students; assessment; exams; multiple choices; essay
The purpose of this study is to describe an approach for evaluating assessments used in the first 2 years of medical school and report the results of applying this method to current first and second year medical student examinations.
Three faculty members coded all exam questions administered during the first 2 years of medical school. The reviewers discussed and compared the coded exam questions. During the bi-monthly meetings, all differences in coding were resolved with consensus as the final criterion. We applied Moore's framework to assist the review process and to align it with National Board of Medical Examiners (NBME) standards.
The first and second year medical school examinations had 0% of competence level questions. The majority, more than 50% of test questions, were at the NBME recall level.
It is essential that multiple-choice questions (MCQs) test the attitudes, skills, knowledge, and competency in medical school. Based on our findings, it is evident that our exams need to be improved to better prepare our medical students for successful completion of NBME step exams.
undergraduate medical education; assessment; course exams; NBME
We present a diagnostic question cluster (DQC) that assesses undergraduates' thinking about photosynthesis. This assessment tool is not designed to identify individual misconceptions. Rather, it is focused on students' abilities to apply basic concepts about photosynthesis by reasoning with a coordinated set of practices based on a few scientific principles: conservation of matter, conservation of energy, and the hierarchical nature of biological systems. Data on students' responses to the cluster items and uses of some of the questions in multiple-choice, multiple-true/false, and essay formats are compared. A cross-over study indicates that the multiple-true/false format shows promise as a machine-gradable format that identifies students who have a mixture of accurate and inaccurate ideas. In addition, interviews with students about their choices on three multiple-choice questions reveal the fragility of students' understanding. Collectively, the data show that many undergraduates lack both a basic understanding of the role of photosynthesis in plant metabolism and the ability to reason with scientific principles when learning new content. Implications for instruction are discussed.
Objective: The purpose of the study was to identify technical item flaws in the multiple choice questions submitted for the final exams for the years 2009, 2010 and 2011.
Methods: This descriptive analytical study was carried out in Islamic International Medical College (IIMC). The Data was collected from the MCQ’s submitted by the faculty for the final exams for the year 2009, 2010 and 2011. The data was compiled and evaluated by a three member assessment committee. The data was analyzed for frequency and percentages the categorical data was analyzed by chi-square test.
Results: Overall percentage of flawed item was 67% for the year 2009 of which 21% were for testwiseness and 40% were for irrelevant difficulty. In year 2010 the total item flaws were 36% and 11% testwiseness and 22% were for irrelevant difficulty. The year 2011 data showed decreased overall flaws of 21%. The flaws of testwisness were 7%, irrelevant difficulty were 11%.
Conclusion: Technical item flaws are frequently encountered during MCQ construction, and the identification of flaws leads to improved quality of the single best MCQ’s.
Frequency; Item writing flaws; Testwiseness
The assessment of the anesthesia course in our university comprises Objective Structured Clinical Examinations (OSCEs), in conjunction with portfolio and multiple-choice questions (MCQ). The objective of this study was to evaluate the outcome of different forms of anesthesia course assessment among 5th year medical students in our university, as well as study the influence of gender on student performance in anesthesia.
We examined the performance of 154, 5th year medical students through OSCE, portfolios, and MCQ.
The score ranges in the portfolio, OSCE, and MCQs were 16-24, 4.2-28.9, and 15.5-44.5, respectively. There was highly significant difference in scores in relation to gender in all assessments other than the written one (P=0.000 for Portfolio, OSCE, and Total exam, whereas P=0.164 for written exam). In the generated linear regression model, OSCE alone could predict 86.4% of the total mark if used alone. In addition, if the score of the written examination is added, OSCE will drop to 57.2% and the written exam will be 56.8% of the total mark.
This study demonstrates that different clinical methods used to assess medical students during their anesthesia course were consistent and integrated. The performance of female was superior to male in OSCE and portfolio. This information is the basis for improving educational and assessment standards in anesthesiology and for introducing a platform for developing modern learning media in countries with dearth of anesthesia personnel.
Anesthesiology; assessment; gender; objective structured clinical examination; portfolios and multiple-choice questions
Sri Lankan rural doctors based in isolated peripheral hospitals routinely resuscitate critically ill patients but have difficulty accessing training. We tested a train-the-trainer model that could be utilised in isolated rural hospitals.
Eight selected rural hospital non-specialist doctors attended a 2-day instructor course. These “trained trainers” educated their colleagues in advanced cardiac life support at peripheral hospital workshops and we tested their students in resuscitation knowledge and skills pre and post training, and at 6- and 12-weeks. Knowledge was assessed through 30 multiple choice questions (MCQ), and resuscitation skills were assessed by performance in a video recorded simulated scenario of a cardiac arrest using a Resuci Anne Skill Trainer mannequin.
Fifty seven doctors were trained. Pre and post training assessment was possible in 51 participants, and 6-week and 12-week follow up was possible for 43, and 38 participants respectively. Mean MCQ scores significantly improved over time (p<0.001), and a significant improvement was noted in “average ventilation volume”, “compression count”, and “compressions with no error”, “adequate depth”, “average depth”, and “compression rate” (p<0.01). The proportion of participants with compression depth ≥40mm increased post intervention (p<0.05) and at 12-week follow up (p<0.05), and proportion of ventilation volumes between 400-1000mls increased post intervention (p<0.001). A significant increase in the proportion of participants who “checked for responsiveness”, “opened the airway”, “performed a breathing check”, who used the “correct compression ratio”, and who used an “appropriate facemask technique” was also noted (p<0.001). A train-the-trainer model of resuscitation education was effective in improving resuscitation knowledge and skills in Sri Lankan rural peripheral hospital doctors. Improvement was sustained to 12 weeks for most components of resuscitation knowledge and skills. Further research is needed to identify which components of training are most effective in leading to sustained improvement in resuscitation.
As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ) has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge.
We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options.
Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified.
Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards.
Undergraduate medical examination is undergoing extensive re evaluation with new core educational objectives being defined. Consequently, new exam systems have also been designed to test the objectives. Objective structured practical examination (OSPE) is one of them.
To introduce OSPE as a method of assessment of practical skills and learning and to determine student satisfaction regarding the OSPE. Furthermore, to explore the faculty perception of OSPE as a learning and assessment tool.
Materials and Methods:
The first M.B.B.S students of 2011 12 batch of Medical College, Kolkata, were the subjects for the study. OSPE was organized and conducted on “Identification of Unknown Abnormal Constituents in Urine.” Coefficient of reliability of questions administered was done by calculating Cronbach's alpha. A questionnaire on various components of the OSPE was administered to get the feedback.
16 students failed to achieve an average of 50% or above in the assessment. However, 49 students on an average achieved >75%, 52 students achieved between 65% and 75%, and 29 students scored between 50% and 65%. Cronbach's alpha of the questions administered showed to be having high internal consistency with a score of 0.80. Ninety nine percent of students believed that OSPE helps them to improve and 81% felt that this type of assessment fits in as both learning and evaluation tools. Faculty feedback reflected that such assessment tested objectivity, measured practical skills better, and eliminated examiner bias to a greater extent.
OSPE tests different desired components of competence better and eliminated examiner bias. Student feedback reflects that such assessment helps them to improve as it is effective both as teaching and evaluation tools.
Biochemistry; evaluation; objective structured practical examination
In Arab countries there are few studies on assessment methods in the field of psychiatry. The objective of this study was to assess the outcome of different forms of psychiatric course assessment among fifth year medical students at King Faisal University, Saudi Arabia.
We examined the performance of 110 fifth-year medical students through objective structured clinical examinations (OSCE), traditional oral clinical examinations (TOCE), portfolios, multiple choice questions (MCQ), and a written examination.
The score ranges in TOCE, OSCE, portfolio, and MCQ were 32–50, 7–15, 5–10 and 22–45, respectively. In regression analysis, there was a significant correlation between OSCE and all forms of psychiatry examinations, except for the MCQ marks. OSCE accounted for 65.1% of the variance in total clinical marks and 31.5% of the final marks (P = 0.001), while TOCE alone accounted for 74.5% of the variance in the clinical scores.
This study demonstrates a consistency among the students’ assessment methods used in the psychiatry course, particularly the clinical component, in an integrated manner. This information would be useful for future developments in undergraduate teaching.
Undergraduate medical students; Assessment; Psychiatry; Undergraduate; Saudi Arabia
There has been comparatively little consideration of the impact that the changes to undergraduate curricula might have on postgraduate academic performance. This study compares the performance of graduates by UK medical school and gender in the Multiple Choice Question (MCQ) section of the first part of the Fellowship of the Royal College of Anaesthetists (FRCA) examination.
Data from each sitting of the MCQ section of the primary FRCA examination from June 1999 to May 2008 were analysed for performance by medical school and gender.
There were 4983 attempts at the MCQ part of the examination by 3303 graduates from the 19 United Kingdom medical schools. Using the standardised overall mark minus the pass mark graduates from five medical schools performed significantly better than the mean for the group and five schools performed significantly worse than the mean for the group. Males performed significantly better than females in all aspects of the MCQ – physiology, mean difference = 3.0% (95% CI 2.3, 3.7), p < 0.001; pharmacology, mean difference = 1.7% (95% CI 1.0, 2.3), p < 0.001; physics with clinical measurement, mean difference = 3.5% (95% CI 2.8, 4.1), p < 0.001; overall mark, mean difference = 2.7% (95% CI 2.1, 3.3), p < 0.001; and standardised overall mark minus the pass mark, mean difference = 2.5% (95% CI 1.9, 3.1), p < 0.001. Graduates from three medical schools that have undergone the change from Traditional to Problem Based Learning curricula did not show any change in performance in any aspects of the MCQ pre and post curriculum change.
Graduates from each of the medical schools in the UK do show differences in performance in the MCQ section of the primary FRCA, but significant curriculum change does not lead to deterioration in post graduate examination performance. Whilst females now outnumber males taking the MCQ, they are not performing as well as the males.
Learning science requires higher-level (critical) thinking skills that need to be practiced in science classes. This study tested the effect of exam format on critical-thinking skills. Multiple-choice (MC) testing is common in introductory science courses, and students in these classes tend to associate memorization with MC questions and may not see the need to modify their study strategies for critical thinking, because the MC exam format has not changed. To test the effect of exam format, I used two sections of an introductory biology class. One section was assessed with exams in the traditional MC format, the other section was assessed with both MC and constructed-response (CR) questions. The mixed exam format was correlated with significantly more cognitively active study behaviors and a significantly better performance on the cumulative final exam (after accounting for grade point average and gender). There was also less gender-bias in the CR answers. This suggests that the MC-only exam format indeed hinders critical thinking in introductory science classes. Introducing CR questions encouraged students to learn more and to be better critical thinkers and reduced gender bias. However, student resistance increased as students adjusted their perceptions of their own critical-thinking abilities.
Confidence-based marking (CBM), developed by A. R. Gardner-Medwin et al., has been used for many years in the medical school setting as an assessment tool. Our study evaluates the use of CBM in the neuroanatomy laboratory setting, and its effectiveness as a tool for student self-assessment and learning.
The subjects were 224 students enrolled in Neuroscience I over a period of four trimesters. Regional neuroanatomy multiple choice question (MCQ) quizzes were administered the week following topic presentation in the laboratory. A total of six quizzes was administered during the trimester and the MCQ was paired with a confidence question, and the paired questions were scored using a three-level CBM scoring scheme.
Spearman's rho correlation coefficients indicated that the number of correct answers was correlated highly with the CBM score (high, medium, low) for each topic. The χ2 analysis within each neuroscience topic detected that the distribution of students into low, medium, and high confidence levels was a function of number of correct answers on the quiz (p < .05). Pairwise comparisons of quiz performance with CBM score as the covariate detected that the student's level of understanding of course content was greatest for information related to spinal cord and medulla, and least for information related to midbrain and cerebrum.
CBM is a reliable strategy for challenging students to think discriminately-based on their knowledge of material. The three-level CBM scoring scheme was a valid tool to assess student learning of core neuroanatomic topics regarding structure and function.
Chiropractic; Educational Measurement; Learning
Global cognitive and psychomotor assessment in simulation based curricula is complex. We describe assessment of novices' cognitive skills in a trauma curriculum using a simulation aligned facilitated discovery method.
Third-year medical students in a surgery clerkship completed two student-written simulation scenarios (SWSS) as an assessment method in a trauma curriculum employing high fidelity human patient simulators (manikins). SWSS consisted of written physiologic parameters, intervention responses, a performance evaluation form, and a critical interventions checklist.
Seventy-one students participated. SWSS scores were compared to multiple choice test (MCQ), checklist-graded solo performance in a trauma scenario (STS), and clerkship summative evaluation grades. The SWSS appeared to be slightly better than STS in discriminating between Honors and non-Honors students, although the mean scores of Honors and non-Honors students on SWSS, STS, or MCQ were not significantly different. SWSS exhibited good equivalent form reliability (r=0.88), and higher interrater reliability versus STS (r=0.93 vs r=0.79).
SWSS is a promising assessment method for simulation based curricula.
Surgical skills courses are an important part of learning during surgical training. The assessments at these courses tend to be subjective and anecdotal. Objective assessment using multiple choice questions (MCQs) quantifies the learning experience for both the organisers and the participants.
MATERIALS AND METHODS
Participants of the open shoulder surgical skills course conducted at The Royal College of Surgeons of England in 2005 and 2006 underwent assessment using MCQs prior to and after the course.
The participants were grouped as non-consultants (14) and consultant orthopaedic surgeons (8). All participants improved after attending the course. The average improvement was 17% (range, 4–43%). We compared the two groups while adjusting for the association between pre-course score and score gain. We found a strong correlation between pre-course score and score gain (r = 0.734; P = 0.001). Adjusted for pre-course score, we found that the score gain (learning) for the non-consultants was slightly larger than for the consultants, but this did not reach statistical significance (P = 0.247).
All participants had a positive learning experience which did not have a significant correlation to the grade of surgeon.
Assessment; Surgical skills course