The emergent field of comparative effectiveness research (CER)‡
is beset by differences in language among stakeholders. These include methodologists in organizations that promote CER, scientists who generate original data or synthesize secondary data, panels of experts who rely on extant research to design guidelines for best practice, and policymakers who identify and prioritize future research needs. For health sciences librarians who regularly support this panoply of stakeholders, it is necessary to know about differences in order to interpret service requests. For example, the following terms are used inconsistently: CER, evidence-based medicine (EBM), and health technology assessment (HTA); randomization and random sampling; efficacy and effectiveness.
Recently, the MLA News
published two accessible reports to introduce librarians to CER in which the authors compare CER and EBM 2
. A more thorough essay comparing CER, EBM, and HTA along several dimensions appears in The Milbank Quarterly
, with some discussion of semantic differences between North America and Europe. The authors of another paper discussing infrastructure needs and capacity for conducting CER report that while capacity is adequate, the “majority of researchers are trained in either observational study methods or randomized trials, but rarely both” 5
. Thus, a lack of awareness of major approaches to research likely exacerbates the confusion in language. Note that in this paper, we use the term language
to mean natural as opposed to formal language, with a focus on the use of phrases to communicate concepts for study designs. Jurafsky and Martin's text explains the disctinction 6
. Several authors provide background papers on the structure of scientific language, sublanguages, and epistemological differences among disciplines 7
An important aspect of CER is the focus on the generalizability of findings to diverse populations of real interest. Broadly, CER is concerned with answering questions regarding effectiveness rather than efficacy of interventions, which has implications for the usefulness of various study designs. Nonrandomized (NR) or observational studies, rather than randomized controlled trials (RCTs), may better answer effectiveness questions, even though well-known threats to validity exist for the former 10
. For example, consider that a well-conducted RCT ensures the statistical equivalence of groups via randomization (random assignment of treatments to experimental units or vice versa) prior to treatment and that finding a treatment effect is therefore likely to be reproducible under the same experimental conditions. However, the design of an RCT promotes internal validity at the expense of external validity (generalizability) when the investigators cannot randomly sample “units,” such as patients. In contrast, researchers who conduct an NR study might randomly sample participants from populations of interest. Random sampling
, if done well, as opposed to random assignment
ensures that study groups will resemble the populations of interest. This is a major reason for recognizing the value of evidence derived from NR studies. In the best of worlds, a CER question would be answered by both RCTs and NR studies. This is why systematic reviewers who synthesize biomedical evidence look for both kinds of studies.
Unfortunately, consensus does not exist regarding how best to describe NR studies common to CER. According to the Cochrane Non-Randomised Studies Methods Group, both investigators and indexers inconsistently describe study designs 11
. Challenges arise for expert searchers, indexers, and methodologists due to the hodgepodge of terms that stakeholders use within and across disciplines. This problem is well known, and groups around the world have issued statements regarding standards for reporting studies and their designs. To improve the value of medical research, an international initiative known as the EQUATOR Network 12
maintains a library of reporting guidelines by study type, such as STAndards for the Reporting of Diagnostic Accuracy Studies (STARD) for diagnostic accuracy studies 13
, Consolidated Criteria for Reporting Qualitative Research (COREQ) 14
, and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) 15
. In general, guidelines suggest that authors name their study design in the title or abstract and use a common term, but names are not standardized. Thus, inconsistent indexing and varying stakeholder language, as well as multiple reporting standards lead to serious retrieval challenges for health sciences librarians.
In this study, we investigated whether methodologists in several highly regarded CER organizations share a terminology for study designs and to what extent. By terminology,
we mean a set of mostly phrases, which is consistent with International Organization for Standardization (ISO) 1087 “Terminology–Vocabulary Standard,” described by Hammond and Cimino 16
. To compare organizational terminologies, we culled design terms and terms for related concepts from relevant documents. We then built a CER design terminology based on the documents we identified to evaluate whether and how terms for study designs used by experts correspond to terms in Medical Subject Headings (MeSH) 17
and Emtree 18
, the controlled vocabularies for MEDLINE and Embase, respectively. To support librarians, we developed a crosswalk between MeSH and Emtree with suggestions for queries when design terms partially map to broad controlled terms or fail to map. We also explored whether scientists use CER design terms to describe their own studies.