|Home | About | Journals | Submit | Contact Us | Français|
We present a new model of patient record search, called SemanticFind, which goes beyond traditional textual and medical synonym matches by locating patient data that a clinician would want to see rather than just what they ask for. The new model is implemented by making extensive use of the UMLS semantic network, distributional semantics, and NLP, to match query terms along several dimensions in a patient record with the returned matches organized accordingly. The new approach finds all clinically related concepts without the user having to ask for them. An evaluation of the accuracy of SemanticFind shows that it found twice as many relevant matches compared to those found by literal (traditional) search alone, along with very high precision and recall. These results suggest potential uses for SemanticFind in clinical practice, retrospective chart reviews, and in automated extraction of quality metrics.
The need for a search function within the patient record has been well-documented  even before the development of health information technology. With the advent of Electronic Health Record (EHR) systems, the amount of information that can be easily recorded in a patient record has increased. Traditionally, medical records were primarily used to document a patient’s medical history and clinical care process to assist physicians in providing informed care. However, today’s electronic medical records serve multiple functions, including patient scheduling, billing, coding, and documenting informed consent. As a result, physicians are finding it increasingly difficult to locate specific information within the patient record, making an effective search utility over electronic medical records all the more necessary.
Studies have identified various barriers to information search within medical records in the patient care process, including lack of time and doubt about the existence of relevant information  . Due to difficulties in locating relevant information within medical records and the time constraints inherent to the clinical setting, physicians often leave questions unanswered at the point-of-care, with subsequent effects on clinical practice and patient care. By making information within the medical record more easily accessible, an efficient and effective search application could potentially reduce physicians’ cognitive load, improve patient care and reduce medical costs. A University of Michigan study found that use of a search engine optimized for finding clinical information in free-text within medical records resulted in significant time saved while maintaining reliability . In a small study, semantic search has been shown to reduce both search time and number of clicks .
While a few search tools on electronic medical records are evolving, they place the burden on users to find the relevant and useful information   . A recent report discusses a graphical model and semantic inferencing over the graph for searching medical records , but it is evaluated as a document retrieval task, where we evaluate matches and present a new search application model. We discuss the use of UMLS (Unified Medical Language System) and distributional semantics in a prototype application called SemanticFind that has the ability to perform sophisticated searches and places the burden on itself rather than on the user. The user simply enters the search term, while the system performs a number of appropriate searches and organizes the results on the screen in a tabbed interface, where each tab represents results of a particular kind of search (see Figure 1); the prototype user interface is described more fully later.
Building an application that is able to find not only instances of the search term, but others that are related to it in a clinically meaningful way, presents three kinds of challenges: (1) performing the necessary natural language understanding and matching, (2) using general medical knowledge in order to drive and control the finding of useful relationships, and (3) high usability: allowing the user to enter the search terms easily, and to present the resulting search hits in an organized and readily-understandable way. Because of these challenges the output produced may not always be what is expected, and therefore it makes sense to conduct an experimental evaluation of the system.
SemanticFind extends the common keyword-search paradigm to help the clinician find what he/she means by the meaning of the keywords, rather than solely by the literal input. To that goal, SemanticFind performs a variety of searches on a single query, including synonyms and paraphrases of the search term, negated and hypothetical men- tions, and matches to other related medications, labs and procedures over both structured and unstructured data .
Even outside of the medical domain, users of keyword search systems run into difficulties when the material they are searching represents the concepts they are seeking using synonyms or paraphrases. In health care, the same ap- plies but in more extreme ways; for example, some concepts are “latent”, requiring interpretation of test results. While we are not suggesting that SemanticFind perform diagnosis (or that necessarily any keyword-search system should do it), we do think it is reasonable that a search for “hyperkalemia” should match “K 6.1”.
Negation is also a property of medical texts in general and medical records in particular, that is critical for the cor- rect understanding of the content, and by extension should be handled properly by a search application . Much of the patient record can be viewed as assertions of what is true of the patient (at the time of the recording); some of these assertions are of conditions that the patient is found not to have. Some medical properties are mutually exclu- sive (such as hyperX and hypoX for any X), so an assertion of one also asserts the absence of the other. It is useful for a search application to find not only what a user is looking for, but also any direct indications that it is absent.
Conditions are often present in a clinical note in a hypothetical context, where it is not asserted that the patient does or does not have a condition, but it may be mentioned as something to test for, that the patient must be aware of the risks of it, that a medication or procedure is being given as a preventative measure, and so on. These will be valid matches when the condition is a search term, but should not be mixed in with the positively or negatively asserted matches. In a similar vein, search terms might be more general or more specific than a term in the record: again matching should occur, but the results organized so as not to conflate the results. We describe in this section the dif ferent kinds of matches that SemanticFind supports, and how they are organized and presented to the user.
We implemented a total of 13 different types of searches, which can be grouped into four classes, based on the technology used and the meaning of the corresponding matches. Most of the matching (in fact, all of the “conceptual” matching) is mediated by UMLS  Concept Unique Identifiers (CUIs), as well as our own natural language processing tools (described in the next section). There are currently about 3 million distinct CUIs in UMLS, representing a large proportion of the distinct concepts in the medical domain. UMLS also maintains collections of relations between CUIs which we exploit (and in some cases have extended). Associated with each CUI is a “preferred name” and a set of “variants”, representing likely ways the concept will be expressed in text. We describe below ways we use to increase the number of variations of the concept. Our general approach to search is to annotate the contents of the medical record with the CUIs associated with the contained medical terms, to similarly annotate the input search phrase with CUIs, and then to match them in several ways. The results of each kind of match are displayed in a separate “tab” in the graphical user interface – see Table 1.
The recognition of CUIs in text is performed by an annotation pipeline using UIMA (Unstructured Information Management Architecture, an industry standard framework for content analytics) ,. This pipeline performs annotation of recognized medical concepts with annotations that record, amongst other data, the UMLS semantic type and the CUI of the concept. These semantic types include Finding, Diagnostic Procedure, ClinicalDrug, DiseaseOrSyndrome and about 130 others.
The pipeline consists of the following stages: tokenization, parsing and predicate-argument structure generation, dictionary lookup of concepts in our Medical Concept Dictionary (MCD)i, lab value analysis and annotation, further concept recognition via a “concept transformer”, and negation & hypothetical detection.
The lab value analysis uses pattern and parse-based analysis to detect measurements of quantities, normalizing units where appropriate. If the quantity is found in our tables of reference ranges goes ahead and asserts the appropriate condition, including normality. Thus “potassium was 8 mEq/L” gets annotated as “hyperkalemia”, but “potassium 4 mEq/L” gets annotated as “normal potassium”.
Concept Transformer understands the different ways that are used to express modification in English (such as adjectives in front of the noun, prepositional phrases following), use of synonyms, and also how to convert between nouns and adjectives and vice-versa (via WordNet ). For example, “pain in the abdomen” _ “abdominal pain”, “swelling of the ankle” _ “edematous ankle”, “heart problems” _ “cardiac problems” and so on.
Negation Detection is the process by which assertions (typically findings and diseases) are detected in the scope of a negation trigger. Triggers include “no”, “not”, “never”, “without”, “absent”, “denies”, amongst others. Unlike the commonly used NegEx  our component uses the parse tree to determine which are the actual concepts that are asserted not to hold. Thus “swelling of the leg was not present” negates the swelling, not the leg. Hypothetical detection works in a similar way, except with its own set of triggers (“if”, “risk of”, “rule out”, “prophylaxis”, etc.).
This search, which is the only one which directly compares text strings, is very similar to the common “Control-F” functionality found in document reading/editing systems. The differences are
Results are shown in the Literal Match tab.
This search matches CUIs in the input with CUIs in the clinical document text from the medical record. Depending on the nature of the match, the result gets added to the appropriate tab: equal or synonymous matches are listed under Semantic Match, unless one is in a negative context or they are opposite or incompatible, when the Contradicted tab is used, or if the context is hypothetical or un-asserted, when Hypothetical is used. If an ISA relation or chain exists between the concepts, the More Specific or More General tab is used.
This search uses Latent Semantic Analysis (LSA)  which from a corpus of training data quantifies the degree to which two concepts are associated by analyzing their distributional properties. We know the semantic types of the concepts involved, but LSA does not give us the kind of relation, just a score representing the probability it exists. We use a threshold of 0.5, on a scale of 0 to 1. The UMLS semantic type of the concept in the document determines the tab that the match is associated with.
In addition to clinical notes, medical records contain structured collections of quantities related to the patient’s care. These include ordered medications, procedures and lab tests, amongst others. Our inferential search makes connections between search terms and these quantities by chaining together one or more relations in medically- and logically-meaningful ways.
UMLS contains many instances of the treats, prevents and causes relations associating medications or procedures with symptoms and diseases. When the search term is a condition, it may be useful to match it against medications and procedures ordered for the patient which are related to it in one of these ways, but because of mismatched granularity/specificity the appropriate UMLS relation will often not directly apply. Our inferential search employs general rules such as:
To allow for looser terminology when specifying search terms and maximum recall during matches, these rules are meant to uncover all possible correct relations rather than only identifying chains that are always true. In addition, note that these rules do not take account of conventional practice or whether the found connection (such as medication treating a condition) is the best possible. Rather they find chains that are logically correct and of potential interest, given that one end is the search term, and the other is a quantity associated with the patient.
By appropriate chaining of relations (from UMLS) according to rules such as those above (developed by us) we get inference chains such as the following when the search term is infection:
Infection <includes> Lower respiratory tract infection <treated by> Amoxicillin <is ingredient of> Augmentin 875 mg-125 mg tablet
which is in the patient’s ordered medications list. Note that we are not asserting that this medication was ordered for this purpose – such treatment connections are not always explicitly made in the medical record, only that this medication may be related to the issued search term due to a possible relation between the two concepts.
The user interface is intended to demonstrate how the search results could be organized in understanding the func tionality of SemanticFind, and as mentioned earlier, the assessment or effectiveness of the UI is not a goal of this study.
SemanticFind organizes the results on the screen in a tabbed interface, where each tab represents results of a particu lar kind of search (see Figure 1). The contents of each tabbed pane depend on whether the search was on unstruc tured (clinical notes) or structured content (such as ordered medications, procedures, or labs). Each tabbed pane is split horizontally into two parts; one part summarizes all results of this particular kind of search, while the other part allows detailed view of the matched content within the context of the original unstructured document or structured table. For any given search, only tabs containing at least one search result are presented.
For unstructured search results, on the left is a hit list of matched clinical notes, on the right the contents of the cur rently selected note. In the hit list each row corresponds to a clinical note with one or more matched terms, and these are arranged in reverse chronological order (most recent at the top). Each entry is in three columns, represent ing the date/id of the note, the kind of note, and a summary of the text of the matches found in the note. In the con tent panel on the right, the matches are highlighted within the selected note.
Structured search results are similar, but with a few differences. Instead of the note type, an explanation is given of the reason for the match between the search term and the structured item. This is felt to be useful to the user since the connection might not always be obvious. In the right-hand pane, the corresponding entry from the structured repository is given. We are experimenting with alternate displays, such as a timeline of medication orders against dosage.
The screenshot in Figure 1 shows the result of issuing the query “pain in the abdomen”. There is no Literal Match of the string, so that tab is absent. Under Contradicted there are three notes, each with a match to one or more of the following terms: “abdominal pain”, “cva tenderness”, “pain”. Even though “pain” is a more general concept than “pain in the abdomen”, it is located in the Contradicted match since negating the former logically negates the latter.
The Contradicted tab is selected in this screenshot, as is one of the notes in the hit-list on the left. In the right panel, the note content is shown, with the matching terms highlighted.
Other tab panes show matches such as “periumbilical pain” in More Specific, “constipation” in Assoc. Tests/Findings, and “acetaminophen 325 mg tablet” in Ordered Medications. We do not show screenshots of these other tabs for reasons of space; instead we summarize in Table 2 the matches found across tabs.
Unlike traditional information retrieval systems, which display search “hits” based on a relevancy score (the highest at the top), SemanticFind organizes its results under various categories and in reverse chronological order. Catego rization of the results helps users easily navigate to what they want to know about the search term. The reverse chronological ordering is common to displaying patient information in the clinical setting and familiar to users of most EHR systems, where temporal proximity to the present time may be more useful. For example, a ten-year-old clinical note with a well-documented discussion of the treatment of hypertension may have the highest relevancy score in a traditional search system for a search on “hypertension”, but may not be very relevant to the patient’s care today.
A study was conducted to evaluate the performance of some of the conceptual matches performed by our system. Semantic Match, Contradicted and More Specific were evaluated. Hypothetical was not operative at the time. More General was excluded due to anticipated difficulties in judging matched concepts that may be too broad to be of use. Other matches that SemanticFind can make (see Table 1) are of an associative or logical nature, and are also not as straightforward to evaluate , so are omitted from the current evaluation.
The traditional way to evaluate search systems  is to perform document relevance judgments on the returned document lists. We used a different methodology for assessing SemanticFind because:
These differences imply a qualitatively different form of evaluation than in traditional Information Retrieval. The latter generates a ranked list of documents whose relevance is judged, and measures are calculated based on the scores of the top N documents for some N, or for all documents that pass a given score threshold. Our judgments are on individual mention matches, which are binary, so ranking-based metrics are meaningless. Instead, as de scribed in the Metrics section later, we count matches that are correct (true positives), matches that are incorrect (false positives) and misses (false negatives), and compute evaluation scores from these counts.
True and false positives are evaluated by showing evaluators the matches in context (see Evaluation Interface). Misses can be determined in principle by tasking annotators to carefully read each clinical note and, for each search term, note any textual expressions that represent it (via any of the match types of interest). Any of these that the system does not find automatically is a miss. However, this human process is extremely time-consuming and error- prone, and for that reason we designed an alternative approach involving the generation of paraphrases. We rea soned that the set of search-term equivalents that an annotator would identify in text would be close enough to the paraphrases they could generate from the search term in a stand-alone task.
Through a research collaboration agreement with the Cleveland Clinic, we acquired de-identified medical records available for our use. Ten of these were selected at random. For each patient record, an MD (Liang) generated a set of 12 or more search terms pertinent to that patient by reviewing the patient’s last progress note and problem list . This method of generating search terms was meant to simulate what healthcare practitioners might ask at the point-of-care, where they may review the patient’s most recent progress note to get a quick overview of the patient prior to the patient encounter. These search terms were then run through SemanticFind and the output reviewed by two fourth-year medical students recruited to participate in the evaluation.
A total of 169 search terms were generated over 10 patient records, ranging from 13 to 32 search terms per record. Search terms included a variety of semantic types (e.g. diseases, findings, labs, medications), single and multi-word concepts, different parts of speech, as well as commonly accepted medical abbreviations. Of the 169 search terms, 134 were unique.
To aid in computing system misses, the assessors were then asked to generate paraphrases for each of the 134 unique search terms. They were instructed to generate as many paraphrases as they could with a few minutes’ considera tion (including none if they felt it was not possible for a particular term), where, according to their judgment, the paraphrases meant clinically the same as the original term. We explain how these paraphrases were used in evalua tion in the Metrics section below.
A total of 652 paraphrases were generated, ranging from 0 to 13 paraphrases per search term. These alternatives were generally in the form of abbreviations, abbreviation expansions, definitions and general English paraphrases. Table 3 shows the paraphrase for some of the designated search terms for a particular patient record.
For evaluation, the interface is enhanced with widgets to allow the user to enter judgments (not shown for reasons of space). In evaluation mode, the assessor will issue a search term, choose a search type, then proceed to examine each note that contains matches of that type, click on those matches in the note view panel, and enter an assessment of GOOD or BAD. We note that since the intent of the end-user in issuing a search term is not known, for each matched concept, the assessment of GOOD is taken to mean that under some reasonable circumstances the located concept is a desirable match, while BAD means under no reasonable circumstances. Similarly, in cases where the meaning of the search term is ambiguous due to lack of knowledge of the clinical context in which the search term was generated (e.g. “cervical biopsy” as a search term matched to both concepts referring to the cervical spine as well as those referring to the female cervix), all interpretations of the search term were considered GOOD given that the interpretation is a reasonable explanation of the search term.
In this evaluation, Precision and Recall were computed. Precision is the number of true positives found divided by the total number of system assertions (i.e. true positives plus false positives), and thus is equivalent to Positive Pre dictive Value. Recall is the number of true positives found divided by the number of positives “in truth” (i.e. true positives plus false negatives), and thus is equivalent to Sensitivity.
For Precision, assessors were asked to review the output from SemanticFind for the three search types being evalu ated and mark each matched instance as GOOD or BAD using the context available in the full clinical note. If a search term was in a positive context but showed up as a Contradicted match, or was in a negative context and showed up as a Semantic/More Specific match, it was marked as BAD.
At the end of the judgment session, the total number of GOOD judgments gives us the true positive (TP) count, the total number of BAD judgments the false positive (FP) count.
Calculating Recall requires having ground truth in which assessors have found every true match instance in the cor pus. In theory this would be done by having assessors read every clinical note and note every instance of a match for every keyword in the experiment. Because we found this method to be both extremely time-consuming and error-prone, we adopted an alternative approach. The assessors were asked to come up with as many clinically- equivalent paraphrases as they could (including none) for each of the search terms, as described earlier. Seman- ticFind processed these expansions and counted how often the GOOD matches in the Precision experiment failed to match instances of the paraphrases in the patient record, which were automatically located via Literal Match – these are false negatives (FN). Since Precision = TP/(TP+FP) and Recall = TP/(TP + FN), having the counts for Precision plus FN allows us to compute Recall.
A total of 13579 matches over 10 patient records were found by SemanticFind and evaluated by two assessors, with an overall Precision of 0.87 over the entire dataset, as shown in Table 4.
We report two values for Recall as calculated using the search term paraphrases generated by the assessors. This is to show the difference between including and excluding search term paraphrases that are not valid UMLS concepts (either UMLS variants or variants found by our additional processing). When used interactively, SemanticFind tells the user if the input term is not a known concept, and gives the user an opportunity to rephrase. When run in this batch evaluation mode, this interaction was absent. Table 5 shows the evaluation results and Recall values. The constrained numbers represent Recall when the unknown concepts are dropped, while the unconstrained numbers represent Recall when considering all paraphrases provided including those that are not valid UMLS concepts.
Finally, we analyzed how many additional GOOD matches, beyond those found by Literal Match, were achieved by Semantic Match, More Specific and Contradicted Match. The Semantic Matches include all Literal Matches that are recognized concepts; matches associated with Semantic Match, More Specific and Contradicted Match are exclusive (i.e. a given text string will be matched by at most one of those). The results are presented in Table 6.
Table 6 shows, in an incremental fashion, how Semantic Match, More Specific and Contradicted augment those found by Literal Match, which represents traditional Control/F matching. We see that all three judged matches to gether found twice as many matches as the baseline Literal Match. On the assumption that literal matches are by definition correctii, we conclude that in our data, half of desired matches are being missed by performing traditional search alone. A stronger baseline would be the Semantic Match result, corresponding roughly to a system that did traditional search with synonym expansion. Relative to that, the combined total of good matches is 68% greater. Either way, we are in good agreement with the work of Koopman et al. , who (despite theirs being a document retrieval task) also showed that addressing the semantic gap uncovered many more match results.
Overall, as compared to the traditional literal match (Precision up to 1.0 and Recall 0.49), SemanticFind has a lower precision but higher recall. By informing the user on entering a search term whether the input term is a known con cept and allowing the user the opportunity to rephrase, search terms can be constrained to be recognized concepts, resulting in an 11-point improvement in Recall from 0.87 to 0.98. Error analysis showed that the errors are spread amongst a variety of causes, mostly tokenization errors and user spelling errors. Precision is also high (0.87) with a substantial percentage of the errors being due to a sentence-end detection problem, which we now explain.
We discovered that the somewhat informal formatting of manual note writing had been causing the system to con catenate consecutive lines as a single sentence when there was no terminal period in the earlier one, causing many instances of incorrect concept detection, specifically incorrect negation analysis. For example, the following text
alcohol use: no
would give rise to recognition of “no smoking”, which would generate the wrong polarity match when the search term was “smoking”. From sampling about half of the error cases, we estimate that 30% of the system’s false posi tives (precision errors) were due to this problem.
In clinical practice, the different types of search performed by SemanticFind may be used to address various infor mation needs that arise at the point-of-care. Depending on the question the user has in mind, one or more types of search performed may be of interest. This highlights another advantage of SemanticFind, the ability to help the user locate relevant information without requiring construction of a complex query. It does so by taking any search term and returning clinically meaningful matches on multiple dimensions. So for a patient who comes in with a vague description of his or her medical history but does not know specific details of it, for example a patient with “heart disease”, issuing “heart disease” on SemanticFind will match general references to cardiac disorders as well as spe cific types of cardiac problems such as cardiomyopathies and arrhythmias, thereby helping the user determine what specific type of heart problem the patient has. The same search will also return matches on related cardiac medica tions, tests, and procedures in both structured and unstructured data, providing further information on past and cur rent management of the patient’s heart problem.
This study has several limitations. First, our medical records all originate from a single institution, Cleveland Clinic. Although health care professionals in general have a common shared vocabulary and terminology, it is likely that health care practitioners within a specific institution may have a shared bias for preferred terminology or syntax for clinical documentation. However, our system was built based on UMLS concept ontology and not tuned specifically to medical records from a specific institution, and therefore should perform just as well on medical records from any other institution. Second, the search terms used to evaluate the system were generated in an artificial setting, alt hough efforts were made to simulate what a user at a point-of-care setting would want to search for.
For a number of reasons, there is considerable “dark matter” in medical records when it comes to text search. Terms that the user is interested in finding can be expressed as synonyms, paraphrases, more specific or more general vari ants, logical equivalents and even in terms of their opposites – all of these can hinder the process of locating them when traditional Find or Control/F searches are used. In this paper, we describe and evaluate SemanticFind, an ap plication that supports the location of medical concepts in both the clinical notes and structured areas of a medical record by employing various types of search. An evaluation of three specific search types, Semantic Match, More Specific and Contradicted, showed that SemanticFind performed with very high precision and recall, and, in our data uncovered twice as many potentially useful matches in the clinical notes as would be found by traditional search alone. With these results, a user can be confident that when using SemanticFind, very few desired matches will be missed, and very few false matches will be found. Considering the variability in clinical language and abundance of content within medical records, we believe that SemanticFind can be a great boon to productivity for any tasks re quiring medical record search, whether that is in a point-of-care clinical setting or in retrospective chart reviews for research purposes.
iThis is similar conceptually to MetaMap , but only exact matches against UMLS concept preferred names and variants are allowed.
iiThis is not technically true if we consider a match of a homograph (a different word spelled the same way) to be incorrect.