|Home | About | Journals | Submit | Contact Us | Français|
Objectives: To test the hypothesis that most instances of negated concepts in dictated medical documents can be detected by a strategy that relies on tools developed for the parsing of formal (computer) languages—specifically, a lexical scanner (“lexer”) that uses regular expressions to generate a finite state machine, and a parser that relies on a restricted subset of context-free grammars, known as LALR(1) grammars.
Methods: A diverse training set of 40 medical documents from a variety of specialties was manually inspected and used to develop a program (Negfinder) that contained rules to recognize a large set of negated patterns occurring in the text. Negfinder's lexer and parser were developed using tools normally used to generate programming language compilers. The input to Negfinder consisted of medical narrative that was preprocessed to recognize UMLS concepts: the text of a recognized concept had been replaced with a coded representation that included its UMLS concept ID. The program generated an index with one entry per instance of a concept in the document, where the presence or absence of negation of that concept was recorded. This information was used to mark up the text of each document by color-coding it to make it easier to inspect. The parser was then evaluated in two ways: 1) a test set of 60 documents (30 discharge summaries, 30 surgical notes) marked-up by Negfinder was inspected visually to quantify false-positive and false-negative results; and 2) a different test set of 10 documents was independently examined for negatives by a human observer and by Negfinder, and the results were compared.
Results: In the first evaluation using marked-up documents, 8,358 instances of UMLS concepts were detected in the 60 documents, of which 544 were negations detected by the program and verified by human observation (true-positive results, or TPs). Thirteen instances were wrongly flagged as negated (false-positive results, or FPs), and the program missed 27 instances of negation (false-negative results, or FNs), yielding a sensitivity of 95.3 percent and a specificity of 97.7 percent. In the second evaluation using independent negation detection, 1,869 concepts were detected in 10 documents, with 135 TPs, 12 FPs, and 6 FNs, yielding a sensitivity of 95.7 percent and a specificity of 91.8 percent. One of the words “no,” “denies/denied,” “not,” or “without” was present in 92.5 percent of all negations.
Conclusions: Negation of most concepts in medical narrative can be reliably detected by a simple strategy. The reliability of detection depends on several factors, the most important being the accuracy of concept matching.
Medical narrative, consisting of dictated free-text documents such as discharge summaries or surgery notes, is an integral part of the clinical patient record. Databases containing free-text medical narrative often need to be searched to find relevant information for clinical and research purposes. Researchers in the field of information retrieval have devised general methods for processing free-text documents so that relevant documents can be returned in response to queries.1,2 One aspect of processing is the indexing of text to improve the effectiveness and speed of its subsequent search. Most commercial, general-purpose information retrieval engines typically index the words in documents; few of them make use of synonym information that inter-relates words or phrases.
To eliminate the need for users to specify synonymous forms of keywords manually when searching a database of documents, and to thereby increase the sensitivity of the search, documents pertaining to a specific domain may also be concept indexed. Here, phrases in the document are identified and matched to concepts in a domain-specific thesaurus. Automated concept indexing based on the National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus3 or its MeSH (Medical Subject Headings) subset has been explored by several researchers.4–8
An important aspect of information-retrieval-based search is the ranking of matching documents by relevance,9,10 giving more weight to documents containing the specified keywords many times and to documents that contain keywords that are rarer in the collection as a whole. For a medical document, however, the presence of a concept does not necessarily make the document relevant for that concept. The concept may refer to a finding that was looked for but found to be absent or that occurred in the remote past.
Negation is a fundamental operation that can invert the “sense” of a sentence or document. In query of large medical free-text databases, the presence of negations can yield numerous false-positive matches, because medical personnel are trained to include pertinent negatives in their reports. Thus, in a search for “fracture” in a radiology reports database, 95 to 99 percent of the returned reports would state “no evidence of fracture,” or words to that effect. Therefore, to increase the utility of concept indexing of medical documents, it is necessary to record whether the concepts have been negated or not.
Negation in natural language can be extremely subtle, and insertion or omission of a single word can completely change the scope and force of a negation and even reverse its polarity.11,12 In medical narrative, however, we expect negations to be much more direct and straightforward, since clinicians are trained to convey the salient features of a case concisely and unambiguously, as the cost of miscommunication can be very high. We therefore hypothesized that negations in dictated medical narrative are unlikely to cross sentence boundaries and are also likely to be simple in structure. Simple syntactic methods to identify negations might therefore be reasonably successful, as they are in computer languages. In particular, most negations in medical documents might be identified, providing the documents are suitably pre-processed with the aid of a lexical scanner that carries out finite state processing and a parser that uses a context-free grammar, using tools of the type used for constructing computer language compilers.
We describe the design of a program for negation recognition (Negfinder) based on existing technology for implementing parsers based on context-free grammars. Such technology has been used successfully for processing both natural language13 and formal (computer programming) languages.14 Negfinder can be used in production mode as part of a pipeline of programs used to create a concept index for a collection of documents. Its output consists of information necessary to index the occurrence of a concept in the document, as described later.
Syntactic parsing mechanisms have been designed extensively for two kinds of grammars—context-free grammars (CFGs) and the more constrained regular or finite-state grammars (FSGs). Finite-state grammars can handle regular expressions, but they cannot handle nested structures of arbitrary depth, whereas CFGs can. This is important, because nested structures are common both in natural languages (“This is the cat that killed the rat that ate the malt that…”) and in formal (computer) languages (“if A then if B then if C then…”). On the other hand, since FSGs are more constrained than CFGs, it is computationally easier to implement efficient parsers for them.
Context-free grammars have been the grammars of choice in both natural language and formal languages, but often the need for computational tractability and efficiency have led investigators to use a restricted subset of CFGs15 or FSGs16 in constrained domains. For instance, the compiling and parsing of computer languages has been streamlined by the use of a restricted subset of CFGs called LR grammars, for which powerful software tools have been developed. In particular, the well-known utilities lex and yacc,17 originally developed for UNIX at AT&T Bell Laboratories, generate highly efficient parsers for a subclass of LR grammars called LALR(1) grammars, and their widespread use is one reason that almost every computer programming language in use today conforms to an LALR(1) grammar. LALR(1)—for Look-Ahead, Left-Recursive (1)—refers to a grammar that uses a look-ahead of one token to match input to a parsing rule, and where recursive constructs are specified left-most in a rule. (A token may map to a single word [lexeme] or to a lengthy phrase.)
Lex generates lexical scanners or analyzers (“lexers”) that work cooperatively with parsers that are generated by yacc. The programmer provides input to lex through a specification based on regular expressions,18 while yacc input is specified through a set of parsing rules written using a modification of Backus-Naur form, which is commonly used to specify a CFG. Lex generates a deterministic finite state machine or automaton (DFA) similar to that generated by parsers of FSGs. The DFA is a program that classifies strings in the input data into various lexical classes in a very time-efficient (though not necessarily space-efficient) manner. During parsing, the lexer sends the parser a token at a time based on patterns that it recognizes: several contiguous lexemes can be combined into a single token. The parser uses information about the token just received to activate an appropriate parsing rule, and the entire input is eventually transformed into a unique parse tree after several parsing rules have been applied in succession.
A number of complexities in the nature of natural language, such as ambiguous syntax, challenge any treatment of negation. Negation itself, in natural languages, is far more subtle and complex in force and scope than it is in formal logic and computer languages, where not p is simply the polar opposite of p. This has been extensively documented with numerous examples by Horn11 and by Horn and Kato.12 Consider the following example based on Horn19:
In sentence 1, the negative operator “not” scopes over “tired”. In sentence 2, the scope of “not” is still over “tired,” but the intercalated phrase “a bit” has increased the force of the negative. In sentence 3, the intercalated “a little,” a synonym of “a bit” in an affirmative sentence, redirects the scope of “not” to itself and away from “tired,” which is affirmed instead of negated, with intermediate force.
If the subtle negations and ambiguities that Horn and others have documented were commonplace in medical narrative, full-blown natural language understanding would be needed to detect them. However, as stated earlier, we expect that most negations in medical narrative would be straightforward in type, scoping over words only in their immediate vicinity, and therefore identifiable without extensive syntactic analysis of sentence structure.
A general-purpose approach to handling negations in queries of information retrieval databases was described by McQuire and Eastman.20 (This problem is simpler than negations in free text, because queries are limited to relatively simple sentence structures that serve the purpose of interrogation.) These authors created a query front-end where, if ambiguity was detected, the user was asked to clarify which constituents of the query they intended to be negated. Their work indicates that it is not possible for a system to automatically disambiguate all uses of negation in queries.
A limited amount of previous work has addressed negation in medical narrative in specific contexts.21,22 NegExpander, by Aronow et al.,21 identifies a small number of negation variants to classify radiology (mammography) reports; in our pilot studies, these were seen to have the simplest negations. NegExpander does not do any syntactic parsing beyond detection of a small set of negative words and conjunctions, but it does expand conjunctive phrases and makes explicit the negation of each component concept. This is a critical function for a general-purpose negation detector, as is shown later.
MedLEE,22 by Friedman et al., does sophisticated concept extraction in the radiology domain and has been recently extended to handle discharge summaries. MedLEE has been ported to the Web23 and has recently been augmented by representing the extracted information in XML (Extended Markup Language).24 The latest version of MedLEE combines a syntactic parser with a semantic model of the domain. MedLEE recognizes instances where negatives are followed by words or phrases that represent specific semantic classes, such as degree of certainty, temporal change, or a clinical finding. It also identifies patterns where only the following verb is negated and not a semantic class (e.g., “X is not increased”). For a general-purpose negation detector, the coverage of both these types of negations needs to be exhaustive enough to have a high rate of success in diverse medical narrative.
To develop an open-ended semantic model for all medical narrative is an immense task. In its absence, negation of medical concepts in text can be determined if the concepts themselves are identified in a pre-processing step. Previous work by Nadkarni et al.25 explored the feasibility of using the UMLS for this purpose; the concept-recognition algorithm described by them is used in the present work, with some augmentations. The algorithm correctly identified 76 percent of concepts (true-positive results) in a test set of documents; the error rate of 24 percent, however, was considered too high for concept indexing to be the only production-mode means of pre-processing medical narrative.
Among the causes of problems in matching were redundant concepts in the UMLS, homonyms, acronyms, abbreviations and elisions, concepts that were missing from the UMLS, proper names, and spelling errors. Some of these problems are discussed in more detail below, as they can significantly affect the detection of negations.
The complexities of natural language semantics are such that even if negation signals in medical narrative do turn out to be simple as we have hypothesized, problems with the accurate recognition of concepts may make it difficult to determine what exactly is being negated. Two specific problems in concept matching that can confound negation detection are composite concepts and homonyms.
As medical knowledge has evolved, simpler concepts have been combined to form higher-level concepts to refer to them concisely. An example is elevation of blood pressure due to kidney disease, which is called nephrogenic hypertension. Higher-level concepts in common usage may enter the UMLS eventually. However, given individual concepts, these can potentially be combined into 2 pairs and 3 triplets. Many of these combinations may be meaningless, whereas others may be perfectly valid composite concepts created on the fly (e.g., “the distal articular surface of the second left metatarsal”). One would expect that the majority of such meaningful combinations will not be recorded as distinct entities in the UMLS.
We define a composite concept as a phrase that does not match completely to a single UMLS concept but whose parts do. An example is digitalis-induced atrial fibrillation, which does not exist in the UMLS; parts of the phrase, however, separately match the UMLS concepts digitalis and atrial fibrillation. A practical problem with composite concepts is that if they are negated, it does not necessarily mean that the individual concepts that the composite comprises have been negated. For example, if digitalis-induced atrial fibrillation was ruled out, it might be that atrial fibrillation itself was present, but that digitalis was ruled out as a cause.
Homonyms are terms that may have multiple meanings. For example, “anesthesia” may refer to a procedure ancillary to surgery or to a clinical finding of loss of sensation. The UMLS records 13,000 ambiguous terms, but this list is by no means complete. For example, “supine,” which is not in the list, could, if encountered in text, imply either of the UMLS concepts “supine position” or “supine decubitus.” (The latter is a subset of the former, because “decubitus” implies position of the torso.)
If a phrase that is an unrecorded homonym is encountered in text, the concept matching process fails. All we discover is that “supine” is a word that is part of several terms for concepts; there is no simple way to differentiate it from words of no medical importance. This creates a problem when a homonym is negated. Although a negation detection program may correctly identify that the homonym lies within the scope of the negation, its index of negated concepts will not record this instance, because no unique UMLS concept ID can be assigned to the homonym.
Our program, Negfinder, consists of several components that operate in pipeline fashion. That is, each step uses the output of the preceding step as its input. The steps are as follows:
Negfinder was designed and trained with a set of 40 medical documents from diverse specialties (radiology reports, surgery notes, discharge summaries). It was then evaluated with a test set of 60 documents (30 discharge summaries, 30 surgical notes) in which the marked-up text was inspected visually to quantify false-positive and false-negative results. To estimate the effect of priming bias caused by examining previously marked negations, we did a second evaluation with 10 documents (5 discharge summaries, 5 surgical notes) where the original (unmarked) text was inspected. Radiology notes were not used for testing because, in our sample data, they turned out to be the easiest with respect to negation identification, yielding almost 100 percent results with just a few simple rules. As discussed later, discharge summaries, especially in psychiatry, proved to be the most challenging. The length of the notes in the test set varied between 600 to 2,000 words, with a sentence length varying between 3 and 26 words. Discharge summaries tended to be longer than surgery notes and also tended to have longer sentences.
The approach used to recognize UMLS concepts in the narrative has been described in detail earlier25 and is summarized here (along with a brief description of recent enhancements and changes necessary to handle negation). We use a phrase-recognizing utility that is bundled with a commercial package (IBM Intelligent Miner for Text26) to identify phrases of potential interest, which we then attempt to match to concepts in a subset of the UMLS Metathesaurus, 2000 Edition, that resides in a Microsoft SQL Server relational database management system. Negfinder also writes its output to the same database. The phrase recognizer uses a customizable stop-word list. (Stop words are very common words in English natural language that are not useful for indexing by themselves, such as common pronouns and conjunctions.) Some words that flag negations, e.g., “denies” and “absent,” also occur in the UMLS and must be filtered out initially, so that the phrases to be recognized do not contain any stop word or negation word.
The concept recognition algorithm seeks to match an entire phrase (up to five non-stop-words long) uniquely to a UMLS concept. If a match to the entire phrase is not obtained, it then attempts to match subsets of the words in the phrase. Ambiguous-term and ambiguous-string lists of the UMLS are used to ensure that the phrase being matched is not homonymous. (If it is, we treat it as a “pseudo-concept” and code it slightly differently from identified concepts.) The current algorithm has been modestly refined over the original to improve specificity and speed.
The algorithmic details of input transformation are straightforward and will not be described. A segment of the transformed input is illustrated along with the original text in Figure 1. The figure shows that, while many non-stop-words and -phrases are matched to UMLS concepts, some are not. This can happen for several reasons—the word does not occur in UMLS (Sarajevo), an exact concept match to a phrase cannot be found (EEG Activity), or the word is an “undiscovered homonym,” where multiple matches occur that cannot be disambiguated using the current versions of the UMLS homonym tables because they have not been recorded as such. Examples of undiscovered homonyms are rub, S1, and S2. Rub can refer to a friction sound (e.g., pleural or pericardial rub), or a physical action, while S1 and S2 can refer to the sacral vertebrae, the corresponding spinal nerves, or to the first and second heart sounds.
In our pilot work, we asked medically trained observers to exhaustively mark all instances of negation on randomly selected discharge summaries, surgical notes, and radiology reports. The purpose of this was twofold:
On the basis of this study, we decided to include only independent words implying the total absence of a concept or thing in the current situation. Words signifying temporal transitions such as stopping or discontinuing a drug were not treated as instances of negation, nor were instances in which a concept itself had a negative connotation, such as “akinesia.”
Most of the negations did turn out to be straightforward, and the words or phrases indicating negation (NegPs) were usually in close proximity to the concepts they negated, as we had expected. Nevertheless, several complexities had to be dealt with as shown below (NegPs are shown in italics and negated elements in bold):
These observations made it apparent that employing the lexical scanning and LR parsing tools used for parsing computer languages was feasible but that a great deal of customization would be required to handle all the complexities.
We built Negfinder's lexer and parser using a software package called Visual Parse++ (SandStone Technology, La Jolla, California). Visual Parse++, which runs under the Windows operating system, incorporates the functionality of the Unix utilities lex and yacc. Visual Parse++ is language independent and incorporates a visual debugging environment, allowing interactive debugging of a grammar specification. On the negative side, differences in the specification language (compared with lex/yacc) make porting of existing lex/yacc specifications a non-trivial chore; in this context, a Visual Parse++ lexer specification tends to be considerably larger than the equivalent lex specification.
The main task of the lexer in our application is to identify words or phrases that signify negation and pass them as special tokens to the parser. This has to be followed by the more difficult task of deciding which particular concepts were negated.
On the basis of an initial training set of medical documents including discharge summaries, surgical notes, and radiology reports, we have set up our lexer to recognize more than 60 distinct words or patterns that express negation, such as “absent” or “not visualized.” Combinations and inflections of these forms are also recognized. Some of the patterns (such as words ending in “n't”) match multiple negative words, and the lexer handles interspersed adverbial forms such as “not currently visualized.” As a result, the number of distinct NegPs recognized by the lexer easily runs into the thousands. A specific token is passed to the parser to represent the exact way in which the NegP is used for negation. Unique tokens are generated for combinations of all the following characteristics of NegPs:
When it encounters words that usually end the scope of negations, the lexer also outputs a “negation-termination” token. This prevents the parser from mislabeling subsequent concepts as being negated. Common negation terminators are:
Since it detects large numbers of NegPs and negation terminators, Negfinder's lexer does much more work than lexers used for traditional programming languages, which typically return tokens corresponding to one or two words at most. The Negfinder lexer often returns a single token that corresponds to several possible combinations of words. Thus, combinations of is/was/were/are/been followed by an optional adverb followed by denied/refused/omitted/lacking/excluded will generate a single token that is passed to the parser. For this purpose, the lexer makes extensive use of regular expressions in addition to using limited part-of-speech information as described above.
Negfinder's parser uses a grammar far simpler than those traditionally used for NLP, because it makes no attempt to parse the sentence structure in detail. Rather, it focuses on the occurrence of concepts matched to the UMLS, negation signals, negation terminators, and sentence terminators and treats most other words merely as fillers. The main task of the parser is to assemble contiguous concepts into a list, if required, to associate a concept or a list of concepts with a negative phrase that either precedes or follows it to form a negation, and to accurately determine where the negation starts and ends. A flavor for the kinds of grammar rules used and the types of difficulties that the parser encounters are given below.*
The first of the tasks listed above—that is, to assemble contiguous concepts into a list,—is accomplished by using the following type of grammar rule:
conceptlist : conceptlist ‘,’ concept.
This rule recursively allows a concept list to incorporate two or more contiguous concepts separated by commas. As discussed above, these rules take into account punctuation (such as commas) and conjunctions (such as “or” and “and”) that may constitute part of a concept list. The situation is made more challenging by the fact that the parser has to cater to those cases in which some concepts are not recognized and coded but are still parts of negated lists. Thus, in a sequence of the type “no filler1, filler2 or concept1,” the parser still has to recognize that concept1 is negated, even though no formal concept list, as defined above, has been started. To account for such cases, a total of 16 such rules are required for this function alone.
Determining when a negation begins and ends can be a difficult problem for a simple parser such as Negfinder's that does not analyze complex sentence structure. A perfect solution would require accurate sentence segmentation and perhaps a rich semantic model to identify cases in which the concept being negated is remote from the negation signal or separated from it by a long phrase or clause. Our parser relies on the list of “negation terminators” as described above. When these do not occur, the parser arbitrarily relies on a window of three intervening filler words (not concepts or NegPs) to terminate the current instance of negation or concept-list formation. (The choice of three words was found to provide a suitable tradeoff of sensitivity vs. specificity.) Thus, two concepts are not considered to be part of a list if they are separated by more than three filler words, and a concept is considered to be negated only if the NegP does not precede or follow it by more than three intervening filler words. As the results show, this simplistic strategy does fairly well in practice in medical documents.
The final output of Negfinder consists of an entry for each concept—its concept ID, its starting and ending byte offsets within the document, the presence of negation, and whether the negation represents a compound concept. This output is similar to the data that are stored in proximity indexes generated by information retrieval engines when individual words are indexed.
Two separate evaluations were carried out. In the first, a set of 60 test documents was passed through Negfinder and the marked-up documents were distributed between three observers,†who were assigned the following tasks:
The first task yielded the number of true-positive results (TP) and false-positive results (FP), and the second one identified the false-negative results (FN). The true-negative results (TN) were calculated as the number of concepts identified in the documents that were not negated.
From these numbers, the specificity (TPx100/ (TP+FP)) and sensitivity (TPx100/(TP+FN)) of Negfinder could be determined.
Since the identification of negations on already marked-up documents could introduce a bias, a second evaluation was carried out using a different set of ten documents that were independently checked for negations by the primary reviewer and Negfinder, and the results were compared. (A smaller number of documents were used in this evaluation because of the significantly increased cost of evaluation.)
The difference between the results of the two evaluations, however, barely missed statistical significance (P=0.0517 by chi-square test), indicating that the differences might have been due to chance. If we assume, however, that the observed difference in specificity is real, it may be accounted for partly by differences in the content of the two document collections and partly by priming bias during the first experiment. The relative contribution of each factor, however, is hard to quantify.
In this paper we have described Negfinder, a program that uses tools designed for parsing computer languages to identify negations in medical documents. There is no doubt that these tools are not powerful enough to handle the general problem of natural language understanding, which is extremely difficult. Nevertheless, the current work shows that, as we had hypothesized, the detection of negation in medical narrative is a constrained problem that is amenable to lexical scanning using a finite state machine and to parsing using the restricted LALR(1) type of CFG.
Negfinder finesses an important limitation of LALR(1) grammars—the problem of single token look-ahead—by identifying and passing complex negating phrases, such as “could not be specifically visualized,” as single tokens and also by the use of states that change the behavior of the lexer on the basis of the occurrence of a negation or concept. Negfinder's lexer generates a finite state machine that is several orders of magnitude larger than its parser, and this is where most of Negfinder's complexity resides. Despite the simplicity of the grammar and the lack of extensive analysis of sentence structure, Negfinder seems to do a fairly adequate job of identifying negations in medical narrative.
The results show that Negfinder has a sensitivity and specificity between 91 and 96 percent in detecting negations in medical documents. The slightly worse specificity values on our second evaluation suggests some evidence of priming bias when the negations are marked up by Negfinder as opposed to being independently discovered by a human observer. Nevertheless, the results are not substantially worse (sensitivity unchanged, specificity lower by 6 percent, but still about 92 percent) in the independent mark-up case; as previously stated, the differences between the two evaluations did not reach statistical significance.
An interesting side observation of the second evaluation was that, even though the human observer took great care in proofreading the ten notes, Negfinder still flagged a few obvious negations that had been overlooked by him, and these were comparable in number to the more difficult ones that it missed. This means that Negfinder's sensitivity was as good as that of the human observer on this very tedious task. This observation indicates that it might be useful to use a program like Negfinder to carry out an automatic color mark-up of dictated medical documents before they are approved by the dictating physician, to highlight negations and make sure that they have been accurately transcribed (especially by medical dictation systems that use automated speech recognition technology).
The errors made by Negfinder can be classified as follows. (In the examples, concept strings are shown in bold, negated concept strings in bold italics, and negating phrases in italics):
In most cases, errors made by Negfinder are easily correctable by syntactic methods and involve minor modifications of the lexer or parser. However, in some cases semantic methods may be required, such as better characterization of temporary composite concepts using noun phrase detection combined with a rich semantic model of the domain.
Analysis of the negating phrases shows that a small number of them are very common and a relatively large number appear only occasionally. In our test material, four words appear in 92.5 percent of the negations. These are “no” (49.3 percent), “denies/denied” (20.5 percent), “not” (12.8 percent) and “without” (9.8 percent).
At first glance, this would seem to indicate that a relatively simple parser that recognized these four NegPs alone could be quite accurate. However, further examination shows that this is not the case. First, “no” includes several variations, such as “no evidence of,”“no sign of,” and “no history of.” Second, quite often a single word negates a large number of concepts as a list, as in “Patient denies shortness of breath, chest pain, fever, chills, nausea, vomiting, diarrhea, and abdominal pain.” Thus, a parser must be able to detect concept lists negated by a single NegP. Without this capability, the simple parser mentioned above would catch only 67.8 percent of negated concepts in our test set.
Third, more than 50 percent of the time that the word “not” occurs, it does not negate a concept at all but only a succeeding verb. Thus, “dyspnea is not seen” is quite different from “dyspnea did notimprove,” as in the first instance “dyspnea” is negated, whereas in the second it is not. Thus, Negfinder incorporates large lists of verbs like “see” or “show” that are used to negate concepts and also notes whether they precede or follow the negated concept (“fracture is not seen on X-ray” vs “X-ray does not showfracture”). Such detailed knowledge is the basis of the high sensitivity and specificity shown by Negfinder.
The sophistication of negation in our material was much greater in the discharge summaries than in the surgical notes or radiology reports. (In our first test set of 60 documents, which had equal numbers of discharge summaries and surgical notes, the discharge summaries accounted for the vast majority of errors—34 of 40 errors, or 85 percent.) Hence it is expected that Negfinder should do better on surgical notes and radiology reports.
One weakness of our current concept recognition system in recognizing noun phrases affects detection of the beginning and end of negations. Thus, when a phrase such as “shortness of breath” was negated, Negfinder marked the words “shortness of” and the concept “breath” as negated. However, this is not quite right, since “breath” is not being negated at all, but rather the compound concept “shortness of breath.” If “shortness of breath” is passed to Negfinder as a single concept, then it is correctly flagged as negated. This emphasizes the need for better recognition of compound concepts, perhaps by using a true noun-phrase detection program.
Several enhancements to both negation detection as well as concept recognition have suggested themselves in the course of the evaluations described in this work, and we plan to address these in the near future. As described previously,25 certain other enhancements are required in the phrase recognizer, which is not smart enough to split phrases on verbs, so that it sometimes yields curious pseudo-phrases such as “lower extremity wrapped with ACE bandage” (phrase italicized).
In the next version of the recognizer, we plan to incorporate lookup of the specialist lexicon,27 which is distributed along with the UMLS and contains part-of-speech information about most of the common words in English and their inflected forms. In the above example, this will allow the word “wrapped” to be recognized as a past-participial form that is unambiguously a verb (as opposed to the finite form, “wrap,” which can also be a noun), and this information can be used to split the pseudo-phrase into smaller parts. It remains to be seen to what extent this approach will improve concept matching. We may need to consider the use of a stochastic tagger to resolve the noun-vs.-verb ambiguity, in addition to the existing IBM text analysis tool. (In its defense, the latter taps into a large database of place, person, and organization names and identifies most of them correctly as such, so the developer can easily eliminate many words from further consideration.) Combining the input from both taggers is an interesting challenge.
A final issue is that many UMLS concepts themselves represent antonymous forms of other concepts, e.g., words beginning with “anti-,” “an-,” “un-,” and “non-.” Such forms are not necessarily negations. (Thus, an anti-epileptic drug is used when epilepsy is present; “non-smoker,” however, is a true negation.) Antonym information is not currently recorded in the MRREL table of UMLS. It is possible that antonym information is not useful for query-broadening strategies that use negation, since clinicians who query a system would probably specify an antonym prefix explicitly in the query. This issue, however, requires further experimentation and feedback from users.
The authors thank Prof. Carol Friedman PhD, of Columbia University and the City University of New York, for making a demo version of MedLEE publicly accessible at http://cat.cpmc.columbia.edu/medleexml/demo/. They also thank the reviewers of the earlier version of this paper, as their comments helped to greatly improve its quality.
This work was supported by National Institutes of Health grant R01 LM06843-01 from the National Library of Medicine.
*More details are available online, at http://www.jamia.org, as appendixes to this article.
†The observers were the three authors. The first and third authors (PGM and PMN) were the developers of Negfinder, while the second author (AD) was the primary reviewer. The final tally of the results was done only after all three observers conferred and discussed their findings, which were then re-examined by the primary reviewer, taking into account types of errors found or missed by all the observers. Although the reliability of the observers was not measured, this iterative process was found to considerably increase the reliability of the error detection.