Results from this study show only 13.8% of all items had three functioning distractors and just over 70% had only one or two functioning distractors. The low proportion of items with three functioning distractors was not altogether surprising given that all tests were generated by in-house teaching faculty, most of who have minimal training in item writing – a situation that is likely similar to most tertiary education settings. Furthermore, other research suggests that even professionally developed test items on standardized exams rarely have more than two functional distractors. Haladyna and Downing [7
] found that approximately two-thirds of all four-option items they reviewed had only one or two functioning distractors and none of the five-option items had four functioning distractors. Because it is often difficult for teachers to develop three or more equally plausible distractors, additional distractors are often added as "fillers." An item with two plausible distractors, however, is preferable to an item with three or four implausible distractors [4
] as students rarely select these options anyway. More is not necessarily better when producing distractors – the key is the quality of the distractors, not the number [6
]. The low frequency of items with more than two functioning distractors and the finding that only about one-half of all distractors were functioning suggests that three-option items are the most practical choice for in-house tests. Haladyna and Downing [7
] concluded that because so few items had more than two functioning distractors, "three options may be a natural limit for multiple-choice item writers in most circumstances" (p. 1008). A meta-analysis of 80 years of research on the number of options in MCQs also concluded that three options is optimal for MCQs in most settings [29
Conversely, there is no psychometric reason that all items must have the same number of options as some questions would naturally have more or less plausible distractors than others [30
]. So while in most circumstances, three options would be sufficient, item writers should write as many good distractors as is feasible given the content area being assessed [6
]. Additionally, when reviewing item performance on previous tests, test developers and item writers should not eliminate options that perform adequately simply to conform to a pre-set number of options [16
]. Many teacher-developed tests however, particularly summative tests, must conform to institutional guidelines as to how many options test items have. These guidelines are rarely evidence-based [31
] and are more likely to be based on routine practices and/or set procedures. Teachers often do not have the flexibility to set items with varying numbers of options. In such circumstances, given the low proportion of items with four functioning distractors, three-option items would appear to be the most reasonable choice.
Of further concern is the high proportion of items that did not have any functioning distractors (12.3%). These items would inevitably have high item difficulty statistics (>.90) with almost all students getting the items correct. When absolute pass scores are used and set at a fixed percentage (i.e., 50%), as they are in the institution where these tests were administered, such a high proportion of easy items likely results in many borderline candidates passing who should not. Pass standards should be set relative to the difficulty of the test using one of a number of established procedures (i.e, the Angoff method or the Ebel procedure) [32
] not simply by using a common but arbitrary figure such as 50%.
Although MCQs with three functioning distractors produced the most discriminating items in this study, this relationship should be viewed with caution as option discrimination and item discrimination are closely related and it is inevitable that items with more discriminating options are more discriminating overall. Items in this study with more functioning distractors were also more difficult than options with fewer functioning distractors. There was, however, little difference in item difficulty between items with two and three functioning distractors. Other research comparing item discrimination and difficulty when the number of options was reduced has found no difference in the shorter items. Owen and Froman [16
] randomly administered 100 items to 114 undergraduate students as either five-option items or three-option items and found no significant differences in either item discrimination or difficulty. In comparing five-option items with both three- and four-option items, Trevisan et al. [19
] found that three-option items were more discriminating and had fewer items with non-performing distractors than five-option items. A review of numerous studies concluded that reducing items from four options to three options decreases item difficulty (.04), increases item discrimination (.03), and also increases reliability (.02) [29
]. Conversely, developing new three-option items without the benefit of knowing how items have already performed may not produce the same improvements in item and test psychometric properties as reducing the number of options in previously tested items [18
]. If three-option items are not well constructed and the two available distractors are non-functioning, overall test scores would increase substantially. When developing new items, irrespective of the number of options, items should be developed by content experts in accordance with accepted item writing guidelines and peer reviewed prior to use to ensure that the answer is unambiguously correct and that all distractors are plausible [33
Despite a growing body of research supporting the use of three-option MCQs, this format continues to be the exception rather than the norm. Large testing bodies [34
], item-writing textbooks [16
], instructor's manuals and MCQ item banks [35
] rely on either four- or five-option MCQs. Hence, most teacher-developed MCQs in health-science disciplines are either four- or five option items. Why teachers have been reluctant to use three-option MCQs is unclear. It may be that longer more complex items appear to be more rigorous [16
]. Teachers may also feel that three-option MCQs increase weaker students' chances of guessing the correct option [18
]. Furthermore, teaching and assessment practices are often handed down from senior to junior teachers and four- or five-option items are the traditional MCQ format [16
]. Finally, it may also be that teachers themselves have little control over the format and type of items used in institutional assessments. These policies may be set by administrators, who for the same reasons identified above, are reluctant to use fewer than four or five options on summative tests.
Three-option MCQs however, offer many benefits to teachers. First, fewer options reduce testing time [6
]. Conversely, with fewer options, more items can be added to tests to increase the sampling content while keeping testing time constant. Aamodt & McShane [11
] estimated that on three-option tests, students can complete an additional 12.4 MCQs in the same time required to complete 100 four-option items. A greater number of items also has the additional benefit of increasing test reliability. Additionally, writing only three-options per item saves time generating items. Generating plausible options is time consuming and if each distractor takes five minutes to generate, writing three-option instead of five-option items will save over 16 hours of time on a 100-item test [18
]. Furthermore, our simulated analysis demonstrates that reducing the number of options from four to three does not result in substantially higher scores as a result of guessing. Overall, there was only a 1% increase in mean test scores after removal of the least functioning distractor. The effect of guessing on multiple-choice tests scores is often overestimated and our analysis is consistent with other research which found that on a 100-item test, reducing items from four or five to three-options resulted in a test-score increase of only 1.22 points [11
Results from this study also highlight the importance of reviewing item performance after test administration and using these results to eliminate non-functioning distractors to improve test items in future administrations of the test. The performance of each test item along with each distractor should be assessed using item analysis procedures. Item analysis procedures involve examining the statistical properties of test items in relation to a response distribution [25
]. Distractors that <5% of students select or distractors with discrimination statistics ≥ 0 can easily be identified and modified or removed in future tests. Teachers and test developers can expect that 50% or more of the items they write will fail to perform as expected [37
]. Therefore, item analysis provides valuable data for question improvement and should be incorporated into the process of test development and review. It is only through this iterative process of item analysis and improvement that pedagogically and psychometrically sound tests can be developed.