This study applied dependency parses to negation detection. Given a “focal” named entity in clinical text, we seek to determine whether that named entity is negated. To do so with dependency parses, we first examined a development (i.e., training) set of sentences from the I2B2/VA NLP challenge and manually designed syntactic rules which indicated when terms were negated. We then compared our dependency parser-based approach against the existing cTAKES negation module.
The cTAKES pipeline performs standard text pre-processing on clinical text (, refer to [2
] for details), and satisfies the pre-conditions for a dependency parser. Recall that dependency parsing is defined on pre-tokenized sentences, typically with POS tags and lemmas. cTAKES includes a lemmatizer based on the LVG Specialist Lexicon3
and the Maximum Entropy-based POS tagger from OpenNLP4
. Thus, in this pipeline, a dependency parser and candidate named entity recognition (NER), such as signs/symptoms and diseases/disorders, (both in bold in ) are available to the negation detection module.
A block diagram of modules in the cTAKES pipeline.
We based negation rules on the dependency path between the focus (i.e., named entity) and each other term (i.e., hypothesized negation words) in the text. Dependency paths can easily be calculated between any two nodes on a dependency tree. In , “edema” is a focus term discovered by the dictionary lookup module, and the dependency path between “no” and “edema” is simply a subset of the dependency parse.
This dependency path can also be represented as a string which enables path matching implementation. In this paper, however, we will use the graphical representation for clarity.
To use the cTAKES named entities as focus annotations for DepNeg, we followed the steps below.
- Find the head node. For each focus annotation, we first normalized multi-word foci to the highest node or headword to represent the whole span, since dependency paths are undefined on multi-word expressions. For example, if “rib fracture” was the focus annotation in “No evidence of rib fracture,” we would choose “fracture” as the representative word (assuming “rib” is a dependent of “fracture”).
- Calculate paths. Given the representative word of the named entity, we considered each other token in the sentence, and calculated its path to the focus annotation.
- Match paths. For each token-to-focus dependency path, evaluate whether the dependency path matches a pattern in a bank of acceptable dependency path patterns (see below). If there is any match, negate the named entity.
Dependency Path Patterns
The dependency path patterns () were developed by examining a small subset of data from the 2010 I2B2/VA NLP challenge. The I2B2 data were run through the whole cTAKES pipeline, including the dependency parser. divides the negation patterns on dependency paths up into several types of syntactic constructions which can result in the negation of a named entity.
- Negated Verbs (3 patterns). Negating a verb (e.g., “did not show…”) usually implies that the whole verb phrase is negated, and thus that objects or complements should be negated.
- Negative Verbs (3 patterns). Some verbs (e.g., “denies”) inherently disqualify the direct object of the verb. A special case also exists, in which the verb is only negative if a particle is present (e.g., “rules out”).
- Negative Prepositions (1 pattern). A few prepositions (e.g., “without”) will negate the object of the preposition.
- Negated Nouns (2 patterns). Certain negating determiners (e.g., “no”) will negate the nouns they modify.
- Negative Adjectives (3 patterns). Some adjectives (e.g., “negative”) will negate the nouns they modify.
- Conjunction Expansion. This rule is in fact a meta-rule, allowing for conjunctions or lists of any kind to be used in every other rule.
DepNeg negation patterns based on dependency paths, with examples
These pattern rules were implemented in regular expression form on the string representation of paths, but are presented in graphical form in . The graphical patterns are similar to the notation we have used for dependency trees and dependency paths. One difference is that bold or italicized text represents a set of terms (cross-indexed with ); acceptable paths would then draw from that set. Additionally, blank nodes or relations indicate that any term or relationship would be acceptable, given that the other conditions were satisfied.
cTAKES negation module keywords, reused in DepNeg
The existing cTAKES negation module uses negation words and other keywords in its term pattern matching (see ). To isolate the contribution of dependency parses to negation, we used the same sets of negation words and keywords in the DepNeg dependency path patterns. It should be clarified that while we have motivated the rules by their syntactic behavior, negation words and other keywords used in the existing cTAKES negation module are not entirely consistent or optimal for a dependency-based negation paradigm. Thus the syntactic categories or POS tags may at times be misnomers – INNEG, for example, should be the set of negative prepositions, but it includes words such as “absent” and “none” which are not prepositions.