The assessment of problem-solving skills, and specifically diagnostic skills, was once reserved for examination formats such as free-response questions, patient management problems (PMPs) or oral examinations. These evaluation methods, however, are all resource-intensive, thus making it difficult to provide the representative sampling of problems necessary to circumvent the problem of case specificity, which predicts that success in solving one clinical presentation does not predict success in another [1
]. As a consequence of case specificity, reliability and content validity of an examination are dependent on a broad sampling of problems. Such extensive sampling is more easily done with pencil-and-paper type of tests. This study will examine two pencil-and-paper formats specifically in regards to their relative problem-solving testing abilities.
Previous literature has demonstrated that altering item stems tends to determine clinical challenge, while psychometric properties such as discrimination and difficulty tend to be affected by the number of answer options [2
], hereby referred to as 'number of alternatives'. The central focus of this paper surrounds whether altering the number of alternatives within a pencil-and-paper format alters diagnostic higher order thinking and/or format psychometric properties. Two formats were studied, both with a stem consisting of a long vignette with distracters, but with different number of alternatives. The format presenting five options to the examinee will henceforth be referred to as the "multiple-choice question" or MCQ format, while the second format, presenting greater than ten options to the examinee, will be referred to as "extended-matching" or EMQ format.
The first examination format studied is the five-option MCQ (see Appendix A for example). Although MCQs have always been considered an efficient and reliable testing tool, they have not always been perceived as ideal for the evaluation of higher-order thinking skills such as problem solving. Prevailing perceptions that MCQs assess lower levels of knowledge such as recall of isolated facts, and/or encourage trivialization, do exist in the medical education community [3
]. To the extent that some clinicians question whether MCQs can test problem-solving skills, suggests that this format may have low validity [4
]. However, as discussed by Case and Swanson [5
] well constructed MCQs could challenge students to problem solve. Maguire et al also recognized that MCQs could yield valid information of clinical reasoning skills, providing that stems and alternatives are well constructed [6
]. Evidence does exist that MCQs have predictive value for more recognized problem-solving tasks [7
] and can elicit higher order problem solving such as forward reasoning [8
The second examination format the EMQ format, initially designed in response to some of the criticisms of the MCQs. EMQs (see Appendix A for example) were introduced in the 1990s in both the NBME and USMLE, amongst others. Case and Swanson [5
] have been instrumental in the development of these questions, which are defined as any matching format with more than the five alternatives traditionally used by MCQs. From its conception, the question preparation of the EMQs has been very careful in designing stems that test higher cognitive levels such as problem solving. The first study that examined the psychometric features of Extended-matching [5
] questions showed that Extended-matching items were more difficult, more discriminating, had higher reliability and needed significantly less testing time to achieve reproducible scores than traditional MCQs. Other studies have shown that EMQs, by increasing the number of alternatives used, increased mean item difficulty as well as, perhaps by reducing guessing, provided improvement in item discrimination over the five-option MCQ [9
]. By increasing item discrimination, EMQs offer comparable levels of reproducibility with 30% fewer items than the MCQ with five options [9
]. Reliability coefficients were also markedly higher with Extended-matching [5
]. Positive psychometric outcomes have been found in other studies using the format [10
These studies have focused on psychometrics, whereas potential benefits, and possible reasons for such benefits, of the EMQ format over standard MCQs in eliciting higher order problem solving remain unclear. No study has formally used think-aloud protocols to assess whether a well-written MCQ differs from EMQs in challenging examinees to problem solve. There is little doubt that poorly written MCQs can encourage students to learn isolated facts by rote. In fact, all available evaluation methods potentially yield information on clinical reasoning if the content is appropriate, suggesting that content is more important than question type [15
The two examination formats will be tested for their ability to elicit the three different diagnostic reasoning strategies generally available to learners: hypothetico-deductive reasoning, pattern recognition, and scheme-inductive reasoning. Deductive reasoning (hypothetico-deductive) [16
] is a "to-and-fro" strategy of problem solving, also termed "backward reasoning". The method is generally used by novices or experienced diagnosticians to include or exclude a single diagnosis, when faced with a particularly complex problem, or as a fallback strategy when faced with clinical problems that are outside their domains of expertise.
Pattern recognition has been identified by other research as a very successful approach used by experts to solve clinical problems [17
]. Before becoming more expert in problem solving, learners progress through several transitional stages characterized by different knowledge structures: elaborated causal networks, abridged networks, illness scripts, and instance scripts [18
]. Extensive experience eventually leads to acquisition of a repertoire of problems common to the domain of expertise termed "illness scripts". This repertoire permits problem resolution by recognition of new problems as ones that are similar or identical to old ones already solved, and the solutions are recalled.
The third strategy is scheme-inductive reasoning. "Schemes" are defined as a mental categorization of knowledge that includes a particular organized way of understanding and responding to a complex situation. They are drawn on paper like "inductive trees" or "road maps" to recreate the major divisions (or chunks) used by expert clinicians for both storage of knowledge in memory and its retrieval for solving problems [19
] (see Figure for an example of the scheme for "dysphagia"). Decisions are explicitly at the forks in the road or branching of the tree. The organizational structure, or "scheme", proceeds from alternative causes in a forward direction, through crucial "tests", to exclusion of some alternative causes and adoption of what is left. These tests may be based on an evaluation of symptoms, signs, or results of investigations, singly or in any combination. Scheme-inductive reasoning is a strategy used by experts when pattern recognition is not possible [21
]. This type of problem solving represents the "climbing of a conditional inductive tree" [22
Example of the scheme for "dysphagia".
By directly comparing the problem-solving strategies elicited by the two pencil-and-paper formats, using the think-aloud method previously described, two major questions will be addressed. The first question is whether pencil-and-paper formats such as EMQ and MCQ can in fact assess problem-solving skills. The examination formats' capacity to evoke more 'expert' methods of problem solving, such as scheme-inductive reasoning or pattern recognition, will be taken as evidence of their ability to assess problem-solving skills. The second question relates to the impact of the alternatives number on psychometric properties and diagnostic higher order thinking, considering that a shift to hypothetico-deductive reasoning could conceivably occur with the shorter alternatives lists of the MCQ format. A corollary to these questions is whether in testing problem solving, it is the construction of question stems that is important, as opposed to the number of alternatives or examination format.