4.1. Comparison to TCS, TimeML, and ERGO
No existing KR for temporal expressions met our requirements. We developed EliXR-TIME based upon reusing existing temporal KRs and named the classes and slots based on their functional and semantic types. We provide a further comparison of EliXR-TIME to related temporal schemas or ontologies below.
The TCS did not facilitate recursive temporal patterns in clinical text.11
In EliXR-TIME, we have a Frequency Constraint class that supports expressions such as daily or 5 days per week.
For a clinical note, the TCS evaluated not only temporal information in the note, e.g., 6 days before a visit
, but the date and time of the note itself. However, in eligibility criteria, the actual time-stamped date of visit
will not be known until each individual patient’s record is queried. Therefore, visit
must be explicitly defined in order to perform these short time-interval calculations; this was handled by our TAE class.
TimeML is a highly detailed temporal specification modeling language. EliXR-TIME differs from TimeML by emphasizing the integrity of a semantic unit. For example, a sentence segment from TimeBank 1.2 is “at least 30 days before closing the purchase”. The TimeML annotation does not generally represent it as a single semantic unit but groups “at least 30 days” into a TIMEX3 expression, while “closing” and “purchase” are labeled with type EVENT; “before” would be represented as a SIGNAL and a TLINK would link a temporal relation to the EVENT. In EliXR-TIME, we label the entire phrase as a single TAE with quantitative_concept = at least 30 days, time_lag = before, and anchor_point = closing purchase. This allows the construction of a hierarchical representation of the entire phrase. In contrast, TimeML only permits annotations of individual terms connected by SIGNALs or LINKs and not annotations of an entire phrase. Furthermore, TimeML is designed to capture temporal information related to types of statements, such as those that use intention verbs like feel, love, hope, believe, and suspect. However, these types of statements are not relevant for eligibility criteria because inclusion and exclusion criteria state conditions, diseases, actions, etc. that a prospective patient has had in the past. In EliXR-TIME, we sought to focus only on representing the semantic types that were necessary for annotating the information contained within the text. We omitted unnecessary semantic types in an attempt to balance expressiveness and tractability for knowledge representation.
Another related project is the Eligibility Rule Grammar and Ontology (ERGO)-annotation.2
does not represent temporal information in eligibility criteria, though its extension, a Generalized ERGO Annotation, is under development that does include temporal constraints on the main noun phrase of an eligibility criterion. EliXR-TIME differs from ERGO in that EliXR-TIME focuses on defining the semantic types (e.g., events and anchors) for sentence segments in temporal eligibility criteria and the combination patterns, or frame-based templates, of these semantic types. In contrast, ERGO defines the constraint types logically, which often requires intelligent translation or mapping from sentence segments to constraint types.
This study has two limitations. First, we only instantiated a relatively small sample of eligibility criteria containing temporal expressions, with a data set of 70 temporal eligibility criteria: 50 in the training set and 20 in the evaluation set. However, each of the 70 criteria was unique, i.e., non-redundant, and together represented a variety of temporal expression constructions. We are confident that EliXR-TIME represents most of the temporal expressions found in eligibility criteria. Second, the two raters who instantiated the test criteria were also the developers; therefore, the general usability and reliability across raters independent of the development team remains to be proven. However, a separate study successfully demonstrated the potential of using this model to develop a conditional random fields algorithm to automatically extract and annotate temporal expressions from eligibility criteria.17
4.3. Future Work
Our future work involves using EliXR-TIME to use information extraction tools to automatically chunk and annotate semantic segments in temporal expressions for a large eligibility criteria corpus.
We identified four research challenges for future study.
First, expressions containing medically specific temporal information can be implicit. For example, the inherent meaning of cancer in remission contains temporal information including the idea that a diagnosis of cancer was made in the past, treatment (e.g., surgery, chemotherapy, radiation) was performed, negative lab test results indicating absence of cancer have been received, and finally a certain time period has elapsed since a positive lab test result. This type of expression contains important temporal information and should be represented as a past event relative to “Now”, where now is the time of inference.
Second, for many criteria, translating English into logic can be daunting because a word such as during can be mapped to multiple Allen temporal relations. Therefore, we did not construct EliXR-TIME to be limited to one rigidly structured temporal representation per criterion but rather to be sufficiently flexible to allow multiple temporal relations to exist between an event and an anchor. For example, the criterion Administration or planned administration of immunoglobulins and / or blood products during a period starting from 3 months prior to administration of the vaccine and ending at study conclusion can be represented by five Allen relations: equals, during, finishes, starts, and during inverse ().
Figure 3 Multiple valid Allen temporal relations for the criterion:
Administration or planned administration of immunoglobulins and / or blood products during a period starting from 3 months prior to administration of the vaccine and ending at study conclusion (more ...)
Third, mapping between criterion sentence constituents, on one hand, and EliXR-TIME classes and attributes on the other can still be challenging. The difficulty lies in mapping and accurately representing the words and phrases in criterion text to a structure that represents their appropriate semantic meaning. For instance, the criterion Laboratory confirmed influenza disease within 6 months can be broken down into a TLE containing an event = Laboratory confirmed influenza disease, Allen temporal relation = during, and an anchor = the past 6 months. This criterion presents two challenges: 1. within must be translated into the Allen temporal relation during and 2. the past tense must be inferred so that 6 months is represented as the past 6 months. Mapping a fixed duration such as 6 months to an implied interval the past 6 months can be problematic because in some criteria, the fixed duration is intended while in others the implied past interval is intended.
Fourth, a criterion contains an English word that is the name of an Allen temporal relation, such as before or after, whose contextual meaning does not correspond to its Allen meaning. Disambiguating when to use the Allen temporal relation to represent the English word can cause difficulties. For example, the criterion subject has agreed to continue adequate contraception during the entire treatment period and for 2 months after completing of the vaccination series contains two TLEs, both of which use the Allen temporal relation during. The first TLE captures the meaning that contraception be used during the entire treatment period and the second TLE captures the meaning that contraception be used during a relative time interval with begin_point = end of the treatment period and end_point = 2 months after end of treatment period. In this context, after indicates a time lag and not an Allen temporal relation.