Reliable and valid written tests of higher cognitive function are difficult to produce, particularly for the assessment of clinical problem solving. Modified Essay Questions (MEQs) are often used to assess these higher order abilities in preference to other forms of assessment, including multiple-choice questions (MCQs). MEQs often form a vital component of end-of-course assessments in higher education. It is not clear how effectively these questions assess higher order cognitive skills. This study was designed to assess the effectiveness of the MEQ to measure higher-order cognitive skills in an undergraduate institution.
An analysis of multiple-choice questions and modified essay questions (MEQs) used for summative assessment in a clinical undergraduate curriculum was undertaken. A total of 50 MCQs and 139 stages of MEQs were examined, which came from three exams run over two years. The effectiveness of the questions was determined by two assessors and was defined by the questions ability to measure higher cognitive skills, as determined by a modification of Bloom's taxonomy, and its quality as determined by the presence of item writing flaws.
Over 50% of all of the MEQs tested factual recall. This was similar to the percentage of MCQs testing factual recall. The modified essay question failed in its role of consistently assessing higher cognitive skills whereas the MCQ frequently tested more than mere recall of knowledge.
Construction of MEQs, which will assess higher order cognitive skills cannot be assumed to be a simple task. Well-constructed MCQs should be considered a satisfactory replacement for MEQs if the MEQs cannot be designed to adequately test higher order skills. Such MCQs are capable of withstanding the intellectual and statistical scrutiny imposed by a high stakes exit examination.
determine what standard paediatric medical students would set for
examining their peers and how that would compare with the university standard.
computer marked examination with questionnaire.
students during their final paediatric attachment.
students asked to derive 10, five branch negatively marked multiple
choice questions (MCQs) to a standard that would fail those without
sufficient knowledge. Each 10 were then assessed by another student as
to the degree of difficulty and the relevance to paediatrics. One year
later student peers sat a mock MCQ examination derived from a random 40 questions (unaware that the mock MCQs had been derived by peers).
MEASURES—Comparison of marks obtained in mock and
final MCQ examinations; student perception of the standard in the two
examinations assessed by questionnaire.
derived 439 questions, of which 83% were considered an appropriate
standard by a classmate. One year later 62students sat the mock
examination. Distribution of marks was better in the mock MCQ
examination than the final MCQ examination. Students considered the
mock questions to be a more appropriate standard (72%
v 31%) and the topics more relevant (88%
v 64%) to paediatric medical students.
Questions were of a similar clarity in both examinations (73%
in this study were able to derive an examination of a satisfactory
standard for their peers. Involvement of students in deriving
examination standards may give them a better appreciation of how
standards should be set and maintained.
This paper is an attempt to produce a guide for improving the quality of Multiple Choice Questions (MCQs) used in undergraduate and postgraduate assessment. Multiple Choice Questions type is the most frequently used type of assessment worldwide. Well constructed, context rich MCQs have a high reliability per hour of testing. Avoidance of technical items flaws is essential to improve the validity evidence of MCQs. Technical item flaws are essentially of two types (i) related to testwiseness, (ii) related to irrelevant difficulty. A list of such flaws is presented together with discussion of each flaw and examples to facilitate learning of this paper and to make it learner friendly. This paper was designed to be interactive with self-assessment exercises followed by the key answer with explanations.
Pitfalls; assessment; student
The outcome of assessments is determined by the standard-setting method used. There is a wide range of standard – setting methods and the two used most extensively in undergraduate medical education in the UK are the norm-reference and the criterion-reference methods. The aims of the study were to compare these two standard-setting methods for a multiple-choice question examination and to estimate the test-retest and inter-rater reliability of the modified Angoff method.
The norm – reference method of standard -setting (mean minus 1 SD) was applied to the 'raw' scores of 78 4th-year medical students on a multiple-choice examination (MCQ). Two panels of raters also set the standard using the modified Angoff method for the same multiple-choice question paper on two occasions (6 months apart). We compared the pass/fail rates derived from the norm reference and the Angoff methods and also assessed the test-retest and inter-rater reliability of the modified Angoff method.
The pass rate with the norm-reference method was 85% (66/78) and that by the Angoff method was 100% (78 out of 78). The percentage agreement between Angoff method and norm-reference was 78% (95% CI 69% – 87%). The modified Angoff method had an inter-rater reliability of 0.81 – 0.82 and a test-retest reliability of 0.59–0.74.
There were significant differences in the outcomes of these two standard-setting methods, as shown by the difference in the proportion of candidates that passed and failed the assessment. The modified Angoff method was found to have good inter-rater reliability and moderate test-retest reliability.
To investigate the teaching of cognitive skills within a technical skills course, we carried out a blinded, randomized prospective study.
Twenty-one junior residents (postgraduate years 1– 3) from a single program at a surgical-skills training centre were randomized to 2 surgical skills courses teaching total knee arthroplasty. One course taught only technical skill and had more repetitions of the task (5 or 6). The other focused more on developing cognitive skills and had fewer task repetitions (3 or 4). All were tested with the Objective Structured Assessment of Technical Skill (OSATS) both before and after the course, as well as a pre- and postcourse error-detection exam and a postcourse exam with multiple-choice questions (MCQs) to test their cognitive skills.
Both groups' technical skills as assessed by OSATS were equivalent, both pre- and postcourse. Taking their courses improved the technical skills of both groups (OSATS, p < 0.01) over their pre-course scores. Both groups demonstrated equivalent levels of knowledge on the MCQ exam, but the cognitive group scored better on the error-detection test (p = 0.02).
Cognitive skills training enhances the ability to correctly execute a surgical skill. Furthermore, specific training and practice are required to develop procedural knowledge into appropriate cognitive skills. Surgeons need to be trained to judge the correctness of their actions.
Clinical reasoning is a core competence of doctors. Therefore, the assessment of clinical reasoning of undergraduate students is an important part of medical education. Three medical universities in the Netherlands wish to develop a shared question database in order to assess clinical reasoning of undergraduate students in Computer-Based Assessments (CBA). To determine suitable question types for this purpose a literature study was carried out. Search of ERIC and PubMed and subsequent cross referencing yielded 30 articles which met the inclusion criteria of a focus on question types suitable to assess clinical reasoning of medical students and providing recommendations for their use. Script Concordance Tests, Extended Matching Questions, Comprehensive Integrative Puzzles, Modified Essay Questions/Short Answer Questions, Long Menu Questions, Multiple Choice Questions, Multiple True/False Questions and Virtual Patients meet the above-mentioned criteria, but for different reasons not all types can be used easily in CBA. A combination of Comprehensive Integrative Puzzles and Extended Matching Questions seems to assess most aspects of clinical reasoning and these question types can be adapted for use in CBA. Regardless of the question type chosen, patient vignettes should be used as a standard stimulus format to assess clinical reasoning. Further research is necessary to ensure that the combination of these question types produces valid assessments and reliable test results.
Computer-Based Assessment; Clinical reasoning; Medical undergraduate students
People forget much of what they learn, therefore students could benefit from learning strategies that yield long-lasting knowledge. Yet surprisingly, little is known about how longterm retention is most efficiently mastered. We studied the value of teacher made in class tests as learning aids and compared two types of teacher-made tests (multiple choice and short-answer tests) with a no test (control) to determine their value as aids to retention learning.
The study was conducted on two separate batches of medical undergraduate students. This study compared two types of tests [multiple choice questions (MCQs) and short answer questions (SAQs)] with a no test (control) group. The investigation involved initial testing at the end of the lecture (post instruction), followed by an unannounced delayed retention test on the same material three weeks later. The unannounced delayed test comprising of MCQs and SAQs on the same material was given three weeks later to all the three groups.
In batch I, the MCQ group had a higher mean delayed retention score of 10.97, followed by the SAQ group (8.42) and the control group (6.71). Analysis of variance (ANOVA) test and least significance difference (LSD) post hoc test revealed statistically significant difference between the means of the three groups. Similar results were obtained for batch II
Classroom testing has a positive effect on retention learning; both short-answer and multiple-choice tests being more effective than no test in promoting delayed retention learning, however, multiple-choice tests are better.
Initial testing; delayed retention tests; retention learning
Four- or five-option multiple choice questions (MCQs) are the standard in health-science disciplines, both on certification-level examinations and on in-house developed tests. Previous research has shown, however, that few MCQs have three or four functioning distractors. The purpose of this study was to investigate non-functioning distractors in teacher-developed tests in one nursing program in an English-language university in Hong Kong.
Using item-analysis data, we assessed the proportion of non-functioning distractors on a sample of seven test papers administered to undergraduate nursing students. A total of 514 items were reviewed, including 2056 options (1542 distractors and 514 correct responses). Non-functioning options were defined as ones that were chosen by fewer than 5% of examinees and those with a positive option discrimination statistic.
The proportion of items containing 0, 1, 2, and 3 functioning distractors was 12.3%, 34.8%, 39.1%, and 13.8% respectively. Overall, items contained an average of 1.54 (SD = 0.88) functioning distractors. Only 52.2% (n = 805) of all distractors were functioning effectively and 10.2% (n = 158) had a choice frequency of 0. Items with more functioning distractors were more difficult and more discriminating.
The low frequency of items with three functioning distractors in the four-option items in this study suggests that teachers have difficulty developing plausible distractors for most MCQs. Test items should consist of as many options as is feasible given the item content and the number of plausible distractors; in most cases this would be three. Item analysis results can be used to identify and remove non-functioning distractors from MCQs that have been used in previous tests.
The assessment of the anesthesia course in our university comprises Objective Structured Clinical Examinations (OSCEs), in conjunction with portfolio and multiple-choice questions (MCQ). The objective of this study was to evaluate the outcome of different forms of anesthesia course assessment among 5th year medical students in our university, as well as study the influence of gender on student performance in anesthesia.
We examined the performance of 154, 5th year medical students through OSCE, portfolios, and MCQ.
The score ranges in the portfolio, OSCE, and MCQs were 16-24, 4.2-28.9, and 15.5-44.5, respectively. There was highly significant difference in scores in relation to gender in all assessments other than the written one (P=0.000 for Portfolio, OSCE, and Total exam, whereas P=0.164 for written exam). In the generated linear regression model, OSCE alone could predict 86.4% of the total mark if used alone. In addition, if the score of the written examination is added, OSCE will drop to 57.2% and the written exam will be 56.8% of the total mark.
This study demonstrates that different clinical methods used to assess medical students during their anesthesia course were consistent and integrated. The performance of female was superior to male in OSCE and portfolio. This information is the basis for improving educational and assessment standards in anesthesiology and for introducing a platform for developing modern learning media in countries with dearth of anesthesia personnel.
Anesthesiology; assessment; gender; objective structured clinical examination; portfolios and multiple-choice questions
In Arab countries there are few studies on assessment methods in the field of psychiatry. The objective of this study was to assess the outcome of different forms of psychiatric course assessment among fifth year medical students at King Faisal University, Saudi Arabia.
We examined the performance of 110 fifth-year medical students through objective structured clinical examinations (OSCE), traditional oral clinical examinations (TOCE), portfolios, multiple choice questions (MCQ), and a written examination.
The score ranges in TOCE, OSCE, portfolio, and MCQ were 32–50, 7–15, 5–10 and 22–45, respectively. In regression analysis, there was a significant correlation between OSCE and all forms of psychiatry examinations, except for the MCQ marks. OSCE accounted for 65.1% of the variance in total clinical marks and 31.5% of the final marks (P = 0.001), while TOCE alone accounted for 74.5% of the variance in the clinical scores.
This study demonstrates a consistency among the students’ assessment methods used in the psychiatry course, particularly the clinical component, in an integrated manner. This information would be useful for future developments in undergraduate teaching.
Undergraduate medical students; Assessment; Psychiatry; Undergraduate; Saudi Arabia
Confidence-based marking (CBM), developed by A. R. Gardner-Medwin et al., has been used for many years in the medical school setting as an assessment tool. Our study evaluates the use of CBM in the neuroanatomy laboratory setting, and its effectiveness as a tool for student self-assessment and learning.
The subjects were 224 students enrolled in Neuroscience I over a period of four trimesters. Regional neuroanatomy multiple choice question (MCQ) quizzes were administered the week following topic presentation in the laboratory. A total of six quizzes was administered during the trimester and the MCQ was paired with a confidence question, and the paired questions were scored using a three-level CBM scoring scheme.
Spearman's rho correlation coefficients indicated that the number of correct answers was correlated highly with the CBM score (high, medium, low) for each topic. The χ2 analysis within each neuroscience topic detected that the distribution of students into low, medium, and high confidence levels was a function of number of correct answers on the quiz (p < .05). Pairwise comparisons of quiz performance with CBM score as the covariate detected that the student's level of understanding of course content was greatest for information related to spinal cord and medulla, and least for information related to midbrain and cerebrum.
CBM is a reliable strategy for challenging students to think discriminately-based on their knowledge of material. The three-level CBM scoring scheme was a valid tool to assess student learning of core neuroanatomic topics regarding structure and function.
Chiropractic; Educational Measurement; Learning
Multiple-choice question (MCQ) examinations are increasingly used as the assessment method of theoretical knowledge in large class-size modules in many life science degrees. MCQ-tests can be used to objectively measure factual knowledge, ability and high-level learning outcomes, but may also introduce gender bias in performance dependent on topic, instruction, scoring and difficulty. The ‘Single Answer’ (SA) test is often used in which students choose one correct answer, in which they are unable to demonstrate partial knowledge. Negatively marking eliminates the chance element of guessing but may be considered unfair. Elimination testing (ET) is an alternative form of MCQ, which discriminates between all levels of knowledge, while rewarding demonstration of partial knowledge. Comparisons of performance and gender bias in negatively marked SA and ET tests have not yet been performed in the life sciences. Our results show that life science students were significantly advantaged by answering the MCQ test in elimination format compared to single answer format under negative marking conditions by rewarding partial knowledge of topics. Importantly, we found no significant difference in performance between genders in either cohort for either MCQ test under negative marking conditions. Surveys showed that students generally preferred ET-style MCQ testing over SA-style testing. Students reported feeling more relaxed taking ET MCQ and more stressed when sitting SA tests, while disagreeing with being distracted by thinking about best tactics for scoring high. Students agreed ET testing improved their critical thinking skills. We conclude that appropriately-designed MCQ tests do not systematically discriminate between genders. We recommend careful consideration in choosing the type of MCQ test, and propose to apply negative scoring conditions to each test type to avoid the introduction of gender bias. The student experience could be improved through the incorporation of the elimination answering methods in MCQ tests via rewarding partial and full knowledge.
Anxiety is thought to affect test performance. Studies have shown that students with low levels of test anxiety achieve higher scores on multiple choice question (MCQ) examinations than those with high anxiety levels. Female students have been shown to have higher test anxiety levels than male students. Standardized patient (SP) examinations are being used in medical schools and for licensing purposes. As SP exams are relatively new, there are few studies measuring anxiety levels for the SP test. The purpose of this study was to measure and compare medicine clerkship student SP versus MCQ examination anxiety levels and to determine if level affected test performance.
The Spielberger test attitude inventory was used to measure anxiety in 150 students rotating through the clerkship. Students completed questionnaires after the MCQ and SP examinations. Mean examination scores and anxiety levels were compared. Based on questionnaire scores, students were divided into 3 groups: low, moderate, and high anxiety. The MCQ and SP examination scores were analyzed to determine if male/female anxiety-level affected test performance.
There were no meaningful anxiety-level differences between the SP and MCQ examinations. An inverse relationship between anxiety level and test scores was not identified. Female students had higher anxiety levels but sex differences did not influence examination performance.
Medicine clerkship student test performance is not affected by anxiety level. Implications of the findings for incorporating stress management training in medical school curricula and suggestions for future research are discussed.
standardized patient evaluation; medicine clerkship; test anxiety
Global cognitive and psychomotor assessment in simulation based curricula is complex. We describe assessment of novices' cognitive skills in a trauma curriculum using a simulation aligned facilitated discovery method.
Third-year medical students in a surgery clerkship completed two student-written simulation scenarios (SWSS) as an assessment method in a trauma curriculum employing high fidelity human patient simulators (manikins). SWSS consisted of written physiologic parameters, intervention responses, a performance evaluation form, and a critical interventions checklist.
Seventy-one students participated. SWSS scores were compared to multiple choice test (MCQ), checklist-graded solo performance in a trauma scenario (STS), and clerkship summative evaluation grades. The SWSS appeared to be slightly better than STS in discriminating between Honors and non-Honors students, although the mean scores of Honors and non-Honors students on SWSS, STS, or MCQ were not significantly different. SWSS exhibited good equivalent form reliability (r=0.88), and higher interrater reliability versus STS (r=0.93 vs r=0.79).
SWSS is a promising assessment method for simulation based curricula.
The purpose of this study is to describe an approach for evaluating assessments used in the first 2 years of medical school and report the results of applying this method to current first and second year medical student examinations.
Three faculty members coded all exam questions administered during the first 2 years of medical school. The reviewers discussed and compared the coded exam questions. During the bi-monthly meetings, all differences in coding were resolved with consensus as the final criterion. We applied Moore's framework to assist the review process and to align it with National Board of Medical Examiners (NBME) standards.
The first and second year medical school examinations had 0% of competence level questions. The majority, more than 50% of test questions, were at the NBME recall level.
It is essential that multiple-choice questions (MCQs) test the attitudes, skills, knowledge, and competency in medical school. Based on our findings, it is evident that our exams need to be improved to better prepare our medical students for successful completion of NBME step exams.
undergraduate medical education; assessment; course exams; NBME
Learning science requires higher-level (critical) thinking skills that need to be practiced in science classes. This study tested the effect of exam format on critical-thinking skills. Multiple-choice (MC) testing is common in introductory science courses, and students in these classes tend to associate memorization with MC questions and may not see the need to modify their study strategies for critical thinking, because the MC exam format has not changed. To test the effect of exam format, I used two sections of an introductory biology class. One section was assessed with exams in the traditional MC format, the other section was assessed with both MC and constructed-response (CR) questions. The mixed exam format was correlated with significantly more cognitively active study behaviors and a significantly better performance on the cumulative final exam (after accounting for grade point average and gender). There was also less gender-bias in the CR answers. This suggests that the MC-only exam format indeed hinders critical thinking in introductory science classes. Introducing CR questions encouraged students to learn more and to be better critical thinkers and reduced gender bias. However, student resistance increased as students adjusted their perceptions of their own critical-thinking abilities.
There has been comparatively little consideration of the impact that the changes to undergraduate curricula might have on postgraduate academic performance. This study compares the performance of graduates by UK medical school and gender in the Multiple Choice Question (MCQ) section of the first part of the Fellowship of the Royal College of Anaesthetists (FRCA) examination.
Data from each sitting of the MCQ section of the primary FRCA examination from June 1999 to May 2008 were analysed for performance by medical school and gender.
There were 4983 attempts at the MCQ part of the examination by 3303 graduates from the 19 United Kingdom medical schools. Using the standardised overall mark minus the pass mark graduates from five medical schools performed significantly better than the mean for the group and five schools performed significantly worse than the mean for the group. Males performed significantly better than females in all aspects of the MCQ – physiology, mean difference = 3.0% (95% CI 2.3, 3.7), p < 0.001; pharmacology, mean difference = 1.7% (95% CI 1.0, 2.3), p < 0.001; physics with clinical measurement, mean difference = 3.5% (95% CI 2.8, 4.1), p < 0.001; overall mark, mean difference = 2.7% (95% CI 2.1, 3.3), p < 0.001; and standardised overall mark minus the pass mark, mean difference = 2.5% (95% CI 1.9, 3.1), p < 0.001. Graduates from three medical schools that have undergone the change from Traditional to Problem Based Learning curricula did not show any change in performance in any aspects of the MCQ pre and post curriculum change.
Graduates from each of the medical schools in the UK do show differences in performance in the MCQ section of the primary FRCA, but significant curriculum change does not lead to deterioration in post graduate examination performance. Whilst females now outnumber males taking the MCQ, they are not performing as well as the males.
Surgical skills courses are an important part of learning during surgical training. The assessments at these courses tend to be subjective and anecdotal. Objective assessment using multiple choice questions (MCQs) quantifies the learning experience for both the organisers and the participants.
MATERIALS AND METHODS
Participants of the open shoulder surgical skills course conducted at The Royal College of Surgeons of England in 2005 and 2006 underwent assessment using MCQs prior to and after the course.
The participants were grouped as non-consultants (14) and consultant orthopaedic surgeons (8). All participants improved after attending the course. The average improvement was 17% (range, 4–43%). We compared the two groups while adjusting for the association between pre-course score and score gain. We found a strong correlation between pre-course score and score gain (r = 0.734; P = 0.001). Adjusted for pre-course score, we found that the score gain (learning) for the non-consultants was slightly larger than for the consultants, but this did not reach statistical significance (P = 0.247).
All participants had a positive learning experience which did not have a significant correlation to the grade of surgeon.
Assessment; Surgical skills course
The postgraduate training program in psychiatry in Saudi Arabia, which was established in 1997, is a 4-year residency program. Written exams comprising of multiple choice questions (MCQs) are used as a summative assessment of residents in order to determine their eligibility for promotion from one year to the next. Test blueprints are not used in preparing examinations.
To develop test blueprints for the written examinations used in the psychiatry residency program.
Based on the guidelines of four professional bodies, documentary analysis was used to develop global and detailed test blueprints for each year of the residency program. An expert panel participated during piloting and final modification of the test blueprints. Their opinion about the content, weightage for each content domain, and proportion of test items to be sampled in each cognitive category as defined by modified Bloom’s taxonomy were elicited.
Eight global and detailed test blueprints, two for each year of the psychiatry residency program, were developed. The global test blueprints were reviewed by experts and piloted. Six experts participated in the final modification of test blueprints. Based on expert consensus, the content, total weightage for each content domain, and proportion of test items to be included in each cognitive category were determined for each global test blueprint. Experts also suggested progressively decreasing the weightage for recall test items and increasing problem solving test items in examinations, from year 1 to year 4 of the psychiatry residence program.
A systematic approach using a documentary and content analysis technique was used to develop test blueprints with additional input from an expert panel as appropriate. Test blueprinting is an important step to ensure the test validity in all residency programs.
test blueprinting; psychiatry; residency program; summative assessment; documentary and content analysis; Kingdom of Saudi Arabia
Background: Multiple choice questions (MCQs) are often used in exams of medical education and need careful quality management for example by the application of review committees. This study investigates whether groups communicating virtually by email are similar to face-to-face groups concerning their review process performance and whether a facilitator has positive effects.
Methods: 16 small groups of students were examined, which had to evaluate and correct MCQs under four different conditions. In the second part of the investigation the changed questions were given to a new random sample for the judgement of the item quality.
Results: There was no significant influence of the variables “form of review committee” and “facilitation”. However, face-to-face and virtual groups clearly differed in the required treatment times. The test condition “face to face without facilitation” was generally valued most positively concerning taking over responsibility, approach to work, sense of well-being, motivation and concentration on the task.
Discussion: Face-to-face and virtual groups are equally effective in the review of MCQs but differ concerning their efficiency. The application of electronic review seems to be possible but is hardly recommendable because of the long process time and technical problems.
multiple choice questions; MCQ; face-to-fac; virtual; facilitation; review-committee
As assessment has been shown to direct learning, it is critical that the examinations developed to test clinical competence in medical undergraduates are valid and reliable. The use of extended matching questions (EMQ) has been advocated to overcome some of the criticisms of using multiple-choice questions to test factual and applied knowledge.
We analysed the results from the Extended Matching Questions Examination taken by 4th year undergraduate medical students in the academic year 2001 to 2002. Rasch analysis was used to examine whether the set of questions used in the examination mapped on to a unidimensional scale, the degree of difficulty of questions within and between the various medical and surgical specialties and the pattern of responses within individual questions to assess the impact of the distractor options.
Analysis of a subset of items and of the full examination demonstrated internal construct validity and the absence of bias on the majority of questions. Three main patterns of response selection were identified.
Modern psychometric methods based upon the work of Rasch provide a useful approach to the calibration and analysis of EMQ undergraduate medical assessments. The approach allows for a formal test of the unidimensionality of the questions and thus the validity of the summed score. Given the metric calibration which follows fit to the model, it also allows for the establishment of items banks to facilitate continuity and equity in exam standards.
Undergraduate medical examination is undergoing extensive re evaluation with new core educational objectives being defined. Consequently, new exam systems have also been designed to test the objectives. Objective structured practical examination (OSPE) is one of them.
To introduce OSPE as a method of assessment of practical skills and learning and to determine student satisfaction regarding the OSPE. Furthermore, to explore the faculty perception of OSPE as a learning and assessment tool.
Materials and Methods:
The first M.B.B.S students of 2011 12 batch of Medical College, Kolkata, were the subjects for the study. OSPE was organized and conducted on “Identification of Unknown Abnormal Constituents in Urine.” Coefficient of reliability of questions administered was done by calculating Cronbach's alpha. A questionnaire on various components of the OSPE was administered to get the feedback.
16 students failed to achieve an average of 50% or above in the assessment. However, 49 students on an average achieved >75%, 52 students achieved between 65% and 75%, and 29 students scored between 50% and 65%. Cronbach's alpha of the questions administered showed to be having high internal consistency with a score of 0.80. Ninety nine percent of students believed that OSPE helps them to improve and 81% felt that this type of assessment fits in as both learning and evaluation tools. Faculty feedback reflected that such assessment tested objectivity, measured practical skills better, and eliminated examiner bias to a greater extent.
OSPE tests different desired components of competence better and eliminated examiner bias. Student feedback reflects that such assessment helps them to improve as it is effective both as teaching and evaluation tools.
Biochemistry; evaluation; objective structured practical examination
Many North American medical schools have removed didactic surgical teaching from the nonclinical years, and there has been a trend toward shortening surgical clerkships. Of concern is that this policy has led to a decrease in surgical exposure and a diminished interest in students pursuing a surgical career. We aimed to determine the effect of curricular change on practical experiences during surgical clerkship and to evaluate overall practical clinical exposure of students during surgical clerkship.
We collected validated experience logbooks completed before (1999–2001) and after (2001–2003) the curriculum change at the University of Alberta and converted them into electronic format. The study analyzed 10 procedures and 5 patient management situations. We assessed numbers of procedures performed and student performance on the Objective Structured Clinical Exam (OSCE) and Multiple-Choice Question (MCQ) examinations before and after the curriculum change. In addition, we completed an overall survey of all 4 classes (2000, 2001, 2002, 2003), measuring clinical exposure. We reviewed a total of 428 logbooks.
There were significant gaps in clinical exposure, which was demonstrated by more than 70% of students in each class failing to complete 8 of 15 procedures or managements at least once. No significant change in practical surgical exposure resulted from the curriculum change. The curriculum change did result in a decrease in end-of-rotation MCQ score performance, which was demonstrated by a 5% decrease in the class average after the curriculum change. Students' performance on ward evaluations and their OSCE scores were unaffected.
We were encouraged that a major change in how surgical education is delivered did not have a detrimental effect on subjective and objective evaluations of student performance. However, we are concerned that a considerable number of students appeared to have not performed several inpatient procedures. Further study is warranted to determine whether this is a common problem in other schools. There is a clear need at our school, and no doubt at others, to establish skills centres and other strategies to ensure that this component of medical education is appropriately and effectively taught.
Traditionally the modified essay question (MEQ) paper has attempted to test problem solving and decision making based on an on-going family saga using seven or eight questions to be answered in 90 minutes. Candidates' scripts are double marked by two College examiners. This format imposes constraints on the range of questions asked and results in contrived scenarios. It is possible to be 'coached' for this and double marking is expensive in examiner time. Recent studies show that validity and reliability are improved by increasing the number and range of questions in a 'surgery type' paper. Single marking has been instituted and the MEQ paper will in future consist of 10 or more questions to be answered in 2 hours. Examiners' marking performance is monitored by senior examiners. Technical and statistical considerations are discussed, as are implications for candidates and course organizers.
To develop and psychometrically assess a multiple choice question (MCQ) instrument to test knowledge of depression and its treatments in patients suffering from depression.
A total of 63 depressed patients and twelve psychiatric experts participated. Based on empirical evidence from an extensive review, theoretical knowledge and in consultations with experts, 27-item MCQ knowledge of depression and its treatment test was constructed. Data collected from the psychiatry experts were used to assess evidence of content validity for the instrument.
Cronbach's alpha of the instrument was 0.68, and there was an overall 87.8% agreement (items are highly relevant) between experts about the relevance of the MCQs to test patient knowledge on depression and its treatments. There was an overall satisfactory patients' performance on the MCQs with 78.7% correct answers. Results of an item analysis indicated that most items had adequate difficulties and discriminations.
There was adequate reliability and evidence for content and convergent validity for the instrument. Future research should employ a lager and more heterogeneous sample from both psychiatrist and community samples, than did the present study. Meanwhile, the present study has resulted in psychometrically tested instruments for measuring knowledge of depression and its treatment of depressed patients.