System overview
The authors applied their existing, locally developed KnowledgeMap concept identifier (KMCI) to detect concepts related to colonoscopy that were documented in clinical EMR notes. The KMCI, a general-purpose biomedical NLP system supporting concept identification and negation, has been in use at Vanderbilt and other sites over the past 9 years.
26–28 In the current study, authors developed new algorithms to identify and interpret time descriptors (eg, ‘6/2003’ or ‘5 years ago’) and properly associate them with corresponding clinical events, such as colonoscopies; and assign values for certainty and status (eg, ‘never had colonoscopy’ or ‘discussed a colonoscopy’) for each identified colonoscopy concept occurrence. illustrates how the augmented KMCI applies date and status information to a recognized concept. The authors developed and refined the new NLP components on a training set including all text documents from 300 randomly selected patients. The core of the KMCI concept identification algorithm was not modified or tweaked for colonoscopy concepts. After refining the timing and status parameters on the training set, KMCI was applied to a test set composed of all text documents from 200 randomly selected patients whose colonoscopy statuses were unknown to the study at the time of selection, and whose records system developers had not previously reviewed or analyzed (details follow below). The study employed two board-certified internal medicine physicians to determine the ‘gold standard’ characterizations of colonoscopy timing and status for each test case, by means of a manual review of case-related EMR data.
The first step in the study's NLP ‘pipeline’ is KMCI concept identification, which analyzes an individual EMR document and outputs an XML file containing its highest-ranked ‘recognized’ Unified Medical Language System (UMLS) concepts—including their semantic types, part-of-speech information (derived from a library provided by Cogilex R&D, Inc), noun phrases, normalized word forms, and other information. The study integrated the new KMCI timing and status determination algorithms into a single Perl program. This program first parses the traditional (previous) KMCI output to identify temporal and status ‘tokens’. The program interprets the tokens and then links them to nearby recognized concepts (limited to colonoscopy concepts in the current study). The algorithms use a series of heuristic, linguistic, and semantic rules, as described in more detail below.
Identification of colonoscopy concepts
The authors used concept hierarchies derived from the UMLS metathesaurus to identify relevant concepts pertaining to colonoscopy.
29 In addition, the authors manually queried a database of all words found in the clinical notes and messages that had been entered on the 300 training set patients to find other possible synonyms or common abbreviations/misspellings related to colonoscopy. The authors manually identified, with the help of the training set of EMR documents, 26 UMLS concepts (ie, concept unique identifiers) related to colonoscopy (see appendix 1, available online only, for a list of the concept unique identifiers used). Five new terms were added as local synonyms for existing UMLS concepts; they were ‘cscopy’, ‘C scope’, ‘C scopy’, ‘cscope’, and ‘colonscopy’. No changes were made to the core KMCI concept-identification algorithm for this study.
Identification of time descriptors
The authors developed and applied a general-purpose temporal extraction algorithm to identify and assign dates to colonoscopies, using previous work by other investigators as a guide.
17
18
30 Time descriptors such as ‘last colonoscopy—5/4/04’ or ‘The patient remembers having a c-scope 5 years ago’ commonly appear in medical narratives. The study KMCI algorithm interprets temporal references in three steps: detection of time descriptors (eg, ‘2002’ and ‘2 years ago’); conversion of these descriptors into a standard representation of date and time; and linkage of time descriptors to the corresponding EMR CRC screening test concept.
The temporal algorithm identifies three categories of date information: fully and partly specified dates (‘3/5/03’ and ‘2002’, respectively); past and future relative date references (‘five years ago’, ‘next week’); and, time period references (‘this past year’, ‘3–5 years ago,’ or ‘last decade’). The study-related KMCI modifications used sets of regular expressions that grouped temporal phrases into ‘tokens’ when parsing the sentence.
After token extraction, the system normalized dates into a standard ‘year–month–day’ format. Ambiguity was represented with placeholders: ‘2004–03–XX’ for ‘March 2004’ or ‘199X’ for the decade of the 1990s. For dates specified by a time range (‘2–4 years ago’), the algorithm approximated the date by selecting the average between the two dates defining the interval. In addition, the system does store the start and stop dates specifying the interval.
As described by Zhou and Hripcsak,
30 interpretation of relative dates was necessary (eg, ‘five years ago’, ‘last Thursday’). Such descriptors often contain temporal prepositions (‘in’, ‘at’, or ‘on’), adverbs (‘ago’), or temporal phrases (‘in the past’). The authors developed a lexicon of these temporal phrases and their likely indication of past or future dates with respect to the other chart-based time references and concepts in the note. The actual date of the event was then calculated with date subtraction or addition using the note's date of service.
Assignment of temporal references to concepts
After KMCI normalized a temporal expression, it linked the expression to the best-matching target concept (defined by a set of concepts of interest). The algorithm considered each sentence as an independent entity; temporal references by presumption could only modify CRC screening ‘concepts of interest’ contained within the same sentence. The authors had previously determined, by manual review using the training set, that this presumption was rarely violated. Empirical analysis of the training set indicated that defining a ‘window’ of allowed words between a date and event led to worse performance, because other concepts and status phrases often occurred between the event and the date (eg, ‘In 2002, the patient experienced some rectal bleeding and later had a colonoscopy’). The current study used a simple, empirically derived approach that assigned each temporal reference to its nearest ‘event’, as measured by the number of tokens (eg, words or punctuation) between the date reference and the event. An ‘event’ was defined as any UMLS concept in the list of colonoscopy concepts, any concept with semantic types ‘therapeutic or preventive procedure’ or ‘diagnostic procedure’, and any concept containing words such as ‘surgery’ or ‘repair.’ Multiple events in a list joined by a coordinating conjunction (eg, ‘flex sig, mammogram, and colonoscopy in 2001’) received the same date reference. Similarly, the algorithm assigned a list of dates connected by a coordinating conjunction to the same event (eg, ‘colonoscopies in 1995 and 2005’). The algorithm treated intervening semicolons between a date and an event as boundaries that prevented assignment of the date to the event.
Detection of concept certainty and status
To determine accurately if a patient had undergone colonoscopy, the system had to establish concept certainty (eg, ‘never had a colonoscopy’) and one of six categories of status (see ). For example, the algorithm distinguished between ‘had a colonoscopy’, ‘declined colonoscopy’, and ‘scheduled a colonoscopy’.
To detect status indicators, the authors created a lexicon of base word forms for each status category, which included single words (eg, ‘schedule’, ‘arrange’) and short phrases (eg, ‘overdue for’, ‘set up to have’). The algorithm used each word's part of speech to create negated forms for document processing purposes. For verbs, the algorithm correspondingly created additional verb forms representing different conjugations, tenses, and voices (such as the addition of auxiliary verbs to create the passive form of the verb). Therefore, for the lexicon verb form ‘schedule’, the algorithm would automatically generate other verb forms, such as ‘scheduled’, ‘will be scheduled’, and ‘was not scheduled’. See appendix 2, available online only, for an example of the full list of status words and examples of related variant phrases.
Assignment of certainty and status to concepts
The algorithm uses the part of speech and verb type to assist in assigning the status to an event. For example, transitive verbs modify the event following them; passive verbs modify the events before them. However, if a status indicator appeared as the first or last phrase in a sentence, it could be applied to the concept that came after or before it, respectively, contrary to expected behavior. For example, in the sentence ‘Colonoscopy pt refused’, the algorithm expected the transitive verb ‘refused’ to modify a concept following the verb, but, lacking one, instead applied the ‘declined’ status to ‘colonoscopy’ because the sentence was a cryptic rewording of ‘patient refused colonoscopy’. To assign a status to an event, the algorithm required that the status indicator occur within four words of the event.
Determination of colonoscopy completion (receipt)
After processing all notes with the modified version of KMCI, the study evaluated ‘completed colonoscopies’ by comparing the algorithm's output with the gold standard physician review categorization. The study definition for KMCI-based colonoscopy completion determination was that all UMLS-derived colonoscopy concepts were associated with a past date or ‘today’, and that each had a status of either ‘receipt’ or ‘NULL’. Negated concepts were removed from consideration with respect to receipt.
Evaluation
The authors conducted a preliminary evaluation of the modified KMCI system. The primary study outcome measures were recall and precision for the algorithm-assigned determination of the dates of completed colonoscopies. The ability to recognize dates of colonoscopies is critical to providing real-time CRC screening-related decision support, and for enabling clinical research on screening compliance.
The evaluation, approved by Vanderbilt's institutional review board, randomly selected 200 patients who were aged 50 years or over who had also had more than one primary care clinic visit in the previous year. Authors then used KCMI to identify colonoscopy concepts within all clinical EMR notes from the 200 patients (NB, not all patients had such references in their EMR records). Two physician-reviewers examined all of the sentences containing KMCI-identified references to colonoscopy. The reviewers did not examine any sentences without KMCI-identified colonoscopy concepts. Reviewers scored the algorithm timing and status outputs using a spreadsheet that showed each original sentence in its entirety, highlighted the algorithm-identified date and status words, and indicated the algorithm's interpretation of the date and status strings. Discrepancies between reviewers' determinations were resolved by consensus decision. Reviewers scored each algorithm-identified timing and status reference in each sentence as being true positive (TP, colonoscopy status and timing correctly coded by KMCI), false positive (FP, wrong status or timing descriptor or improperly indicated by KMCI as applying to the patient), true negative (TN, colonoscopy status correctly coded by the algorithm as not done or not known), or false negative (FN, colonoscopy status incorrectly coded by the algorithm as not done or not known when information in the sentence indicated to reviewers that the procedure was done or a timing or status indicator had not been picked up correctly by the algorithm). Recall (sensitivity) was calculated as TP/(TP+FN). Precision (positive predictive value) was calculated as TP/(TP+FP). F measure was calculated as the harmonic mean of recall and precision.
Physician reviewers scored each algorithm-identified temporal tag and status tag independently, so that recall and precision metrics could be calculated separately for each component of the algorithm. Then, reviewers determined from the original sentence whether each sentence indicated that the patient had received a colonoscopy (or not) on a given date (the gold standard). Therefore, for the sentence ‘colonoscopy was rescheduled from originally scheduled date, 3/04’, reviewers would consider a temporal tag a true positive if the algorithm correctly associated ‘colonoscopy’ with ‘2004–03–XX’ even if the status algorithm failed to identify ‘scheduling’ as a modifier for ‘colonoscopy’. Conversely, reviewers could mark the algorithm-assigned status as correct, even if the date was incorrectly interpreted. The reviewers also recorded date and status references omitted by the algorithm (false negatives).