Below we highlight key or often-neglected quality elements for specific study “types.” These types comprise a mixture of study purposes, designs, methods, and manuscript categories frequently found among JGIM medical education submissions, and are neither mutually exclusive nor all inclusive. Many studies will use a combination of these types.
This is not a comprehensive list of standards, even for a given type. Absence from discussion below does not mean a quality element is unimportant, but might simply mean we perceive it as less frequently problematic. Authors should continue to refer to relevant sources to guide the systematic design, conduct, and reporting of research.26–29,48–50
Likewise, the absence of a specific study type does not indicate that we do not value such scholarship. We do not address systematic reviews here, but would welcome rigorous reviews on important education questions51
and refer authors to published guidelines.52–54
Similarly, we accept and encourage theory-building and programmatic research.21,22,24
’s “Educational Innovations” are “succinct descriptions of innovative approaches to improving medical education” and often represent the product of scholarship of teaching.7
“Instructions for Authors” contain detailed specifications. The most important part of an “Innovation” study is demonstration that it is indeed innovative. This necessitates documentation of a thorough literature search. Evaluations of activities that have already been described might merit publication as “Original Research," but they are not appropriate as Innovations. Yet even when an idea has never been previously described, a diligent search will invariably identify previous work (empiric and theoretical) to support the approach followed. Scholarly innovations do not appear from thin air; they build on prior work.
Authors must describe the innovation, including both the educational objectives and the innovation itself, sufficiently well that a reader could implement or adapt the innovation at his/her own institution. As most of these articles represent the scholarship of teaching rather than research, a rigorous evaluation of the innovation is not mandatory. However, only the most innovative and best prepared ideas will merit publication without adequate evaluation. As the degree of innovation goes down, the evaluation rigor must go up. Even then, the key to a successful Educational Innovation publication will be a novel, well-described idea that addresses an important need, has an adequate theoretical/empirical foundation, and builds on prior work.
Authors must demonstrate reflective critique by discussing what went well, what did not work as planned, how and why results vary from other studies, and areas for improvement and future research. Honesty and candor are not penalized. Indeed, an innovation with neutral or unexpectedly negative effects may have as much or more importance in publication as an innovation with statistically significant positive effects. However, the usual caveats of sample size, sensitivity of outcome measures, and strength of intervention apply in studies showing no effect.
Much medical education research relies on a survey as a means of collecting data. Although this is strictly a method rather than a study type, its ubiquity justifies a brief discussion.
Surveys are subject to various sources of bias. They are susceptible to researcher bias in the wording of questionnaire items and the sample selection. Low response rates also introduce possible bias. Surveys often generate large amounts of data, which introduce the danger of bias from conducting multiple statistical analyses and then reporting only the statistically significant findings. Lengthy surveys can also breed long Results sections in which key points are lost amidst excessive data. We propose the following as a starting point for studies using surveys (in addition to the general standards of scholarship noted above) and refer authors to other sources26,55
The research question should be clearly stated and justified. This will focus authors’ collection, analysis, and presentation of data, and also ensure that the survey addresses an important issue.
Based on the research question, a study sample should be selected to reflect the population to which results will be generalized.
The questionnaire must have evidence to support the validity of its scores for answering this research question (see guidelines for the development and evaluation of assessment tools
detailed below). If the study uses a previously published instrument, validity evidence should be concisely summarized and referenced. If the instrument is new, the study at a minimum should report evidence of content (breadth and depth of coverage of topic, systematic development process, qualifications of item writers, expert review, and pilot testing) and score reliability (from pilot testing, actual administration, or both).
Authors should report information on the format of survey administration (mail, Web, phone, other) and describe methods used to encourage response. Although there is no universal definition of adequate response rate, authors should keep this in mind while interpreting results.
It is virtually impossible to avoid investigator bias in studies that conduct multiple analyses and then report only those that are significant or interesting. The best defense against such problems is to develop a focused research question and plan all analyses in advance. When reporting, authors should describe all analyses conducted (including those whose results are not reported). Authors should account for independent comparisons using methods such as omnibus tests of statistical significance or Bonferroni’s adjustment.36
The Results and Discussion should highlight key points that support a clear message.
Authors should generally report verbatim the survey questions, along with any scoring rubrics, either in a table (reporting questions and results in the same table) or as an appendix. It is rarely necessary to publish the actual instrument and saves space to report only the questions. If all questions are not reported, authors should report at least a few examples of typical questions.
Needs Analysis Studies
“Needs analyses” are intended to identify the current state of a specific medical education issue. These frequently address potential deficiencies in content knowledge, but can also explore other educational “gaps,” such as work hour violations or inequities in academic promotion. Most studies evaluating educational interventions will have at least a rudimentary needs analysis, but studies designed as needs analyses face a higher bar.
Needs analyses can employ a variety of methods, including surveys and tests, focus groups, chart audits, task analyses, and literature reviews, but all pose challenges. First, such studies are particularly susceptible to researcher biases and special interests. If we looked hard enough, we would likely conclude that every issue in medical education has unmet needs–at least through the eyes of a person with a particular interest in that topic. Second, the results of a needs analysis depend greatly on the participants sampled and the instrument used. Unfortunately, we frequently see needs analyses employing poorly designed measures administered to convenience samples (e.g., a locally developed and administered anatomy exam). Finally, needs analysis studies often collect far more information than can reasonably be (or needs to be) reported and are susceptible to data analysis problems discussed under Survey Research.
Thus, we propose that needs analysis manuscripts meet four minimum requirements (in addition to other relevant standards). First, the research question should be clearly stated and justified. Second, the study sample must reasonably represent the target population (typically a national scope). Since a deficiency at one institution rarely indicates a national need, single-institution needs analyses–though important to an institution–will generally meet with skepticism. Third, the outcome measures must have evidence to support the “plausible” validity of scores. Authors should generally quote verbatim at least a subset of the items including the scoring rubric (i.e., in a table or as an appendix). Fourth, the Results and Discussion should highlight a clear, concise take-home message.
Development and Evaluation of Assessment Tools
Studies describing the development, evaluation, or revision of assessment tools employ a variety of designs, but in all cases the investigators seek to support the validity of an instrument’s scores for making specific inferences.56,57
Rather than try to address all possible study designs, we will discuss a framework or approach to validity that will facilitate high-quality studies.
The current conceptualization of validity unifies all different “types” of validity (content, criterion, construct, etc.) as “construct validity.”6,58–61
Instruments are intended to generate scores reflecting some underlying construct, and validity is the degree to which scores truly reflect that construct.6
We emphasize that validity is a property of scores, not instruments. Instruments are not inherently valid, but rather scores are valid for a particular purpose.
Validity is best viewed as an hypothesis supported by evidence from various sources.61
As with any hypothesis, validity cannot be proven. Rather, investigators should create a validity argument62
by first stating an initial hypothesis about what construct the scores should reflect; second, collecting evidence (see below) to support or refute that hypothesis (ideally testing the weakest
assumption first); third, revising the hypothesis (either the instrument, the construct, or the context of application) if needed; and fourth, repeating the second and third steps until sufficient evidence has been collected to support (or reject) the validity argument. The sufficiency of evidence will vary depending on the application (a high-stakes Boards exam will require more evidence than a medical school second-year midterm). The evidence should answer the question: is it plausible
that the scores reflect the intended construct?
There are five currently-accepted sources of validity evidence6
: content (how well does the instrument match the intended construct domain?), response process (how do idiosyncrasies of the actual responses affect scores?), internal structure (typically psychometric data such as reliability or factor analysis), relations to other variables (how do scores relate to other variables that purport to measure a similar or different construct?), and consequences (do the scores make a difference?63
). A publishable validity study will present data from several
(but rarely all) complementary sources of evidence,64
and ideally address the most critical or questionable aspects of the validity argument. Instruments intended for broad use often warrant a series of studies. Other sources contain further details and examples.56,58,59,61
We discourage use of the term face validity61,65
and note that this term is frequently misused to allude to content evidence.
Investigators can employ a similar approach when evaluating or adapting instruments for use in a particular study. When reporting the use of a previously described instrument, authors should briefly summarize the evidence supporting its scores for this application. For example, authors might write, “Felder and Solomon developed the Index of Learning Styles (ILS) to assess the ... learning style dimensions defined by Felder and Silverman. ... [Studies] have used internal consistency, test-retest reliability, and factor analysis to support the internal structure of ILS scores. ILS scores have also been shown to discriminate college students with different majors and college students from faculty.”66
Although much education research evaluates the outcomes of specific interventions, we touch only lightly on this study type because other sources25–28,67,68
provide adequate guidance for authors. Guidelines developed for behavioral interventions in clinical medicine and public health, such as the TREND guidelines69
, STROBE statement,70
and CONSORT extension,71
are also relevant. Authors should highlight an empirical or theoretical grounding for the intervention, focus on a gap in theory or educational practice, and conduct an appropriate evaluation using outcomes aligned with both the educational intervention and the study goals. Conceptual frameworks are useful for both applied and theory-building work.24
Randomized designs are not required, but authors must carefully consider relevant validity threats.38,67
Qualitative research will continue to proliferate as researchers recognize the limitations of quantitative methods in answering many important questions and gain necessary skills.72JGIM
has long supported such studies.73
However, such studies must adhere to rigorous standards.30,74–79
Key standards include a focused research question; appropriate sampling and data collection methodologies; inductive analytic methods that promote trustworthiness, credibility, dependability, and transferability (duplicate coding, triangulation, member checks, saturation, peer review, etc.); results that demonstrate a clear logic of inquiry and present appropriate data (i.e., themes and supporting quotations); and a synthesis with clear conclusions. We encourage use of accepted qualitative paradigms or approaches (grounded theory, ethnography, discourse analysis, etc). Mixed methods approaches (using both quantitative and qualitative methods) can often answer questions better than either approach alone.