OCRe is an OWL 1.1 ontology that focuses on the design and analysis of human studies. Its scope includes human investigations of any design type (e.g., interventional, observational) for any intent (e.g., therapeutic, diagnostic, preventive) in any clinical domain on any type of data (e.g., clinical, imaging, genomics). OCRe includes 1) a representation of the structure of human studies and associated entities, 2) informational entities (e.g., study protocols), 3) terms for describing study characteristics, and 4) bindings to standard terminologies (e.g., SNOMED CT).
OCRe is organized as a set of modular components related by their import relationship (). The research
module imports the clinical
, and study_protocol
modules to describe a study. The study_protocol
module imports from the BRIDG model (4
) terms that specify temporal aggregates (e.g., epochs and arms) and sequencing relationships among protocol-driven activities.
Ontology of Clinical Research modules
OCRe modules are independent of any clinical domain because the clinical content is expressed through external ontologies and terminologies such as NCI Thesaurus or SNOMED-CT. OCRe interfaces to these terminologies by relating OCRe entities (e.g., outcome phenomenon) to these external concepts (e.g., acute myocardial infarction) and their associated terminology codes (e.g., SNOMED-CT code for acute myocardial infarction).
In the next sections, we discuss OCRe’s modeling of several key domains of clinical research.
Study Design Typology
We postulated that there exist a small number of high-level study design types that represent distinct approaches to human investigations, and that we could reliably classify all human studies into these design types. Since each study type is subject to a distinct set of biases and interpretive pitfalls, a study’s design type would strongly inform the interpretation and reuse of its data and biosamples.
Through iterative consultation with statisticians and epidemiologists, we defined a typology of study designs based on discriminating factors that define mutually exclusive and exhaustive study types (hybrid studies can be of more than one type). We use these factors as questions in a web-based classification tool (5
). Our tool first classifies studies into human and non-human studies (Does the study use or collect measurements, assessments or observations about individual humans?). It then classifies human studies into qualitative or quantitative studies, and subsequently classifies quantitative studies into four interventional or four observational high-level design types (in red in and ).
For interventional studies (), discriminating factors include whether the investigator has a choice of interventions to which s/he can assign participants, whether the main comparison is within or across participants, and whether intervention assignment and data analysis are only within a single participant. Additional descriptors elaborate on secondary design features (e.g., randomization, blinding) that introduce or mitigate additional interpretive concerns.
For observational studies (), the four design types are based on whether the main control group is defined by case (outcome) or exposure (predictor) status, whether the case and control are in the same person, and whether outcomes are measured at the same time as predictors or after. Additional descriptors other than the ones for interventional studies apply to these observational study types (e.g., retrospective or prospective). The design typology is formalized in OCRe as an OWL hierarchy.
OCRe uses Eligibility Rule Grammar and Ontology (ERGO) Annotation (6
) to capture the clinical content of eligibility criteria in machine-readable form. ERGO Annotation is a declarative representation of eligibility criteria that is informed by both the complexity of natural language and the requirements for computability. ERGO Annotation models three statement types: 1) simple statements making single assertions, 2) statements about quantitative comparisons, and 3) complex statements, which are simple and/or comparison statements joined by Boolean connectives or semantic connectors (e.g., evidenced_by).
Study Outcomes and Analyses
In OCRe, the study protocol specifies the study activities to achieve the study’s scientific objectives, such as the collection and analysis of study data. shows our conceptualization of the entities related to outcomes and analyses in human research. We first define a study phenomenon as “a fact or event of interest susceptible to description and explanation.” Study phenomena are represented by one or more specific study variables that may be derived from other variables. For example, the study phenomenon of cardiovascular morbidity may be represented as a composite variable derived from cardiovascular death, myocardial infarction (MI), and stroke variables. Each variable can be further described by its type (e.g., dichotomous), coding (e.g., death or not), timepoints of assessment (e.g., 6 months after index MI), and assessment method (e.g., death certificate). All variables are associated with participant-level and study-level observations (observations aggregated across subjects).
A study protocol may specify several analyses, each having dependent and independent variables that represent various study phenomena. Variables may play the role of dependent or independent variables in different analyses. If the study protocol designates a primary analysis, the dependent variable of that analysis represents what is conventionally known as the primary outcome of the study. To our knowledge, OCRe is the first model to disambiguate study phenomena of interest from the variables that code observations of those phenomena, and from the use of those variables in study analyses. This clarity of modeling should provide a strong ontological foundation for scientific query and analysis in HSDB.